SlideShare a Scribd company logo
Microsoft Sequence ClusteringAnd Association Rules
OVERVIEWIntroductionDMX QueriesInterpreting the sequence clustering modelMicrosoft Sequence Clustering Algorithm Principles and ParametersMarkov chain modelIntroduction to Microsoft Association RulesAssociation Algorithm Principles and Parameters
Microsoft Sequence ClusteringAnd Association RulesThe Microsoft Sequence Clustering algorithm is a sequence analysis algorithm provided by Microsoft SQL Server Analysis Services.The algorithm finds the most common sequences by grouping, or clustering, sequences that are identical.Ex :  Data that describes the click paths that are created when users navigate or browse a Web site.Data that describes the order in which a customer adds items to a shopping cart at an online retailer.
DMX QueriesBy querying the data mining schema rowset, you canfind various kinds of information about the model such as:Basic metadata, The date and time that the model was created and last processed, The name of the mining structure that the model is based on, The column used as the predictable attribute.
DMX QueriesSELECT MINING_PARAMETERS from $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'Sequence Clustering'    Query to return the parameters that were used to build and train the Sample model.
DMX QueriesSELECT FLATTENED NODE_UNIQUE_NAME, (SELECT ATTRIBUTE_VALUE AS [Product 1], [Support] AS [Sequence Support], [Probability] AS [Sequence Probability]    FROM NODE_DISTRIBUTION) AS t FROM [Sequence Clustering].CONTENT WHERE NODE_TYPE = 13 AND [PARENT_UNIQUE_NAME] = 0Getting a List of Sequences for a StateQuery to return the complete list of first states in the model, before the sequences are grouped into clusters. Returning the list of sequences (NODE_TYPE = 13) that have the model root node as parent (PARENT_UNIQUE_NAME = 0). The FLATTENED keyword makes the results easier to read.Sample  result of this query is shown in the next figure.
DMX Queriesyou reference the value returned for NODE_UNIQUE_NAME  to get the ID of the node that contains all sequences for the model. You pass this value to the query as the ID of the parent node, to get only the transitions included in this node, which happens to contain a list of al sequences for the model.
Interpreting the sequence clustering modelA sequence clustering model has a single parent node that represents the model and its metadata. The parent node, which is labeled, has a related sequence node that lists all the transitions that were detected in the training data.The algorithm also creates a number of clusters, based on the transitions that were found in the data and any other input attributes included when creating the model. Each cluster contains its own sequence node that lists only the transitions that were used in generating that specific cluster.
Interpreting the sequence clustering model
Microsoft Sequence Clustering Algorithm PrinciplesThe Microsoft Sequence Clustering algorithm is a hybrid algorithm that combines clustering techniques with Markov chain analysis to identify clusters and their sequences.This data typically represents a series of events or transitions between states in a dataset. The algorithm examines all transition probabilities and measures the differences, or distances, between all the possible sequences in the dataset to determine which sequences are the best to use as inputs for clustering. After the algorithm has created the list of candidate sequences, it uses the sequence information as an input for the EM method of clustering.
Markov chain modelA Markov chain also contains a matrix of transition probabilities. The transitions emanating from a given state define a distribution over the possible next states. The equation P (xi= G|xi-1=A) = 0.15 means that, given the current state A, the probability of the next state being G is 0.15.
Markov chain modelBased on the Markov chain, for any given length L sequence x {x1, x2,x3,. . .,xL}, you can calculate the probability of a sequence as follows:P(x) = P(xL . xL-1,. . .,x1)        = P(xL| xL-1,. . .,x1)P (xL-1|xL-2,. . .,x1).. .P(x1)In first-order, the probability of each state xi depends only on the state of xi-1.P(x) = P(xL . xL-1,. . .,x1)       = P(xL|xL-1)P(xL-1|xL-2). . .P(x2|x1)P(x1)
Microsoft Sequence Clustering ParametersCLUSTER_COUNTspecifies the approximate number of clusters to be built by the algorithm. Setting the CLUSTER_COUNT parameter to 0 causes the algorithm to use heuristics to best determine the number of clusters to build.The default is 10.MAXIMUM_STATESspecifies the maximum number of states for a non-sequence attribute that the algorithm supports. The default is 100.
Microsoft Sequence Clustering ParametersMINIMUM_SUPPORTspecifies the minimum number of cases that is required in support of an attribute to create a cluster.The default is 10.MAXIMUM_SEQUENCE_STATES specifies the maximum number of states that a sequence can have.The default is 64.
Introduction to Microsoft Association RulesThe Microsoft Association Rules Viewer in Microsoft SQL Server Analysis Services displays mining models that are built with the Microsoft Association algorithm.The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The Microsoft Association algorithm is also useful for market basket analysis.
Structure of an Association ModelThe top level has a single node (Model Root) that represents the model. The second level contains nodes that represent qualified item sets and rules.
Association Algorithm PrinciplesThe Microsoft Association Rules algorithm belongs to the Apriori association family. The two steps in the Microsoft Association Rules algorithm are:calculation-intensive phase, is to find frequent item sets.
Generate association rules based on frequent item sets. Association Algorithm ParametersMINIMUM_SUPPORT is the minimum support found for a frequent itemset.Its value is within the range of 0 to 1.MAXIMUM_SUPPORT is the maximum support found for a frequent itemset.Its value is within the range of 0 to 1. The default value is 0.03.
Association Algorithm ParametersMINIMUM_PROBABILITY is a threshold parameter. It defines the minimum probability for an association rule. Its value is within the range of 0 to 1. The default value is 0.4.MINIMUM_IMPORTANCE is a threshold parameter for association rules. Rules with importance less than Minimum_Importance are filtered out.
Association Algorithm ParametersMAXIMUM_ITEMSET_SIZE specifies the maximum size of an itemset. The default value is 0, which means that there is no size limit on the itemset.MINIMUM_ITEMSET_SIZE specifies the minimum size of the itemset. The default value is 0.MAXIMUM_ITEMSET_COUNTdefines the maximum number of item sets.
Association Algorithm ParametersOPTIMIZED_PREDICTION_COUNTdefines the number of items to be cached to optimized predictionsAUTODETECT_MINIMUM_SUPPORTrepresents the sensitivity of the algorithm used to autodetect minimum support.To automatically detect the smallest appropriate value of minimum support, Set this value to 1.0 .To turns off autodetection, Set this value to 1.0
SummaryIntroduction to sequence clusteringDMX QueriesThe sequence clustering modelMicrosoft Sequence Clustering Algorithm Principles and ParametersMarkov chain modelIntroduction to Microsoft Association RulesAssociation Algorithm Principles and Parameters
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

More Related Content

PPTX
MS SQL SERVER: Decision trees algorithm
PPTX
MS SQL SERVER: Microsoft naive bayes algorithm
PPTX
MS SQL SERVER:Microsoft neural network and logistic regression
PDF
[M4A2] Data Analysis and Interpretation Specialization
PDF
[M4A1] Data Analysis and Interpretation Specialization
PDF
Essay on-data-analysis
PPTX
XL-MINER:Partition
PPT
Pavel_Kravchenko_Mobile Development
MS SQL SERVER: Decision trees algorithm
MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER:Microsoft neural network and logistic regression
[M4A2] Data Analysis and Interpretation Specialization
[M4A1] Data Analysis and Interpretation Specialization
Essay on-data-analysis
XL-MINER:Partition
Pavel_Kravchenko_Mobile Development

What's hot (6)

PPTX
XL-MINER:Prediction
PDF
Chapter 04-discriminant analysis
PDF
Chapter01 introductory handbook
PPTX
XL Miner: Classification
PPTX
WEKA: Output Knowledge Representation
PDF
[M2A3] Data Analysis and Interpretation Specialization
XL-MINER:Prediction
Chapter 04-discriminant analysis
Chapter01 introductory handbook
XL Miner: Classification
WEKA: Output Knowledge Representation
[M2A3] Data Analysis and Interpretation Specialization
Ad

Viewers also liked (20)

PPTX
Quick Look At Classification
PPTX
MySql:Introduction
ODP
Presentazione oroblu
PPTX
LISP:Object System Lisp
PPTX
LISP: Macros in lisp
PPTX
BI: Open Source
PPTX
LISP:Loops In Lisp
PDF
Jive Clearspace Best#2598 C8
PPTX
Data Applied:Decision Trees
PDF
Norihicodanch
PPTX
Mysql:Operators
PPTX
Data Applied: Similarity
PPTX
C,C++ In Matlab
PPTX
Oracle: Joins
PPTX
RapidMiner: Nested Subprocesses
PPT
PPTX
Control Statements in Matlab
PPTX
Matlab Text Files
PPT
Powerpoint paragraaf 5.3/5.4
Quick Look At Classification
MySql:Introduction
Presentazione oroblu
LISP:Object System Lisp
LISP: Macros in lisp
BI: Open Source
LISP:Loops In Lisp
Jive Clearspace Best#2598 C8
Data Applied:Decision Trees
Norihicodanch
Mysql:Operators
Data Applied: Similarity
C,C++ In Matlab
Oracle: Joins
RapidMiner: Nested Subprocesses
Control Statements in Matlab
Matlab Text Files
Powerpoint paragraaf 5.3/5.4
Ad

Similar to MS SQL SERVER: Microsoft sequence clustering and association rules (20)

PPTX
Minería de Datos en Sql Server 2008
PPT
Cluster2
PPTX
5_6305592025861329686.pptx_20240912_120520_0000.pptx
PDF
Analysis of Time Series Data & Pattern Sequencing
PDF
Data Mining System and Applications: A Review
PDF
Ijartes v1-i2-006
PPTX
01 Introduction to Data Mining
PDF
Data clustering a review
PPTX
K- means clustering method based Data Mining of Network Shared Resources .pptx
PPTX
K- means clustering method based Data Mining of Network Shared Resources .pptx
PPTX
Data Mining with SQL Server 2008
PDF
Hadoop Design Patterns
 
PPTX
MapReduce Design Patterns
PPTX
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Clustering - K-Means, DBSCAN
PPTX
MS SQL SERVER: Time series algorithm
PPTX
MS SQL SERVER: Microsoft time series algorithm
PDF
Mat189: Cluster Analysis with NBA Sports Data
Minería de Datos en Sql Server 2008
Cluster2
5_6305592025861329686.pptx_20240912_120520_0000.pptx
Analysis of Time Series Data & Pattern Sequencing
Data Mining System and Applications: A Review
Ijartes v1-i2-006
01 Introduction to Data Mining
Data clustering a review
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
Data Mining with SQL Server 2008
Hadoop Design Patterns
 
MapReduce Design Patterns
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
Clustering - K-Means, DBSCAN
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Microsoft time series algorithm
Mat189: Cluster Analysis with NBA Sports Data

More from DataminingTools Inc (20)

PPTX
Terminology Machine Learning
PPTX
Techniques Machine Learning
PPTX
Machine learning Introduction
PPTX
Areas of machine leanring
PPTX
AI: Planning and AI
PPTX
AI: Logic in AI 2
PPTX
AI: Logic in AI
PPTX
AI: Learning in AI 2
PPTX
AI: Learning in AI
PPTX
AI: Introduction to artificial intelligence
PPTX
AI: Belief Networks
PPTX
AI: AI & Searching
PPTX
AI: AI & Problem Solving
PPTX
Data Mining: Text and web mining
PPTX
Data Mining: Outlier analysis
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data Mining: Graph mining and social network analysis
PPTX
Data warehouse and olap technology
PPTX
Data Mining: Data processing
PPTX
Data Mining: clustering and analysis
Terminology Machine Learning
Techniques Machine Learning
Machine learning Introduction
Areas of machine leanring
AI: Planning and AI
AI: Logic in AI 2
AI: Logic in AI
AI: Learning in AI 2
AI: Learning in AI
AI: Introduction to artificial intelligence
AI: Belief Networks
AI: AI & Searching
AI: AI & Problem Solving
Data Mining: Text and web mining
Data Mining: Outlier analysis
Data Mining: Mining ,associations, and correlations
Data Mining: Graph mining and social network analysis
Data warehouse and olap technology
Data Mining: Data processing
Data Mining: clustering and analysis

Recently uploaded (20)

PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
PDF
Software Development Methodologies in 2025
PDF
REPORT: Heating appliances market in Poland 2024
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Revolutionize Operations with Intelligent IoT Monitoring and Control
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
SparkLabs Primer on Artificial Intelligence 2025
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Transforming Manufacturing operations through Intelligent Integrations
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
Automating ArcGIS Content Discovery with FME: A Real World Use Case
GamePlan Trading System Review: Professional Trader's Honest Take
NewMind AI Weekly Chronicles - July'25 - Week IV
Chapter 2 Digital Image Fundamentals.pdf
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
Software Development Methodologies in 2025
REPORT: Heating appliances market in Poland 2024
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Understanding_Digital_Forensics_Presentation.pptx
Revolutionize Operations with Intelligent IoT Monitoring and Control
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
SparkLabs Primer on Artificial Intelligence 2025
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf

MS SQL SERVER: Microsoft sequence clustering and association rules

  • 2. OVERVIEWIntroductionDMX QueriesInterpreting the sequence clustering modelMicrosoft Sequence Clustering Algorithm Principles and ParametersMarkov chain modelIntroduction to Microsoft Association RulesAssociation Algorithm Principles and Parameters
  • 3. Microsoft Sequence ClusteringAnd Association RulesThe Microsoft Sequence Clustering algorithm is a sequence analysis algorithm provided by Microsoft SQL Server Analysis Services.The algorithm finds the most common sequences by grouping, or clustering, sequences that are identical.Ex : Data that describes the click paths that are created when users navigate or browse a Web site.Data that describes the order in which a customer adds items to a shopping cart at an online retailer.
  • 4. DMX QueriesBy querying the data mining schema rowset, you canfind various kinds of information about the model such as:Basic metadata, The date and time that the model was created and last processed, The name of the mining structure that the model is based on, The column used as the predictable attribute.
  • 5. DMX QueriesSELECT MINING_PARAMETERS from $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'Sequence Clustering' Query to return the parameters that were used to build and train the Sample model.
  • 6. DMX QueriesSELECT FLATTENED NODE_UNIQUE_NAME, (SELECT ATTRIBUTE_VALUE AS [Product 1], [Support] AS [Sequence Support], [Probability] AS [Sequence Probability] FROM NODE_DISTRIBUTION) AS t FROM [Sequence Clustering].CONTENT WHERE NODE_TYPE = 13 AND [PARENT_UNIQUE_NAME] = 0Getting a List of Sequences for a StateQuery to return the complete list of first states in the model, before the sequences are grouped into clusters. Returning the list of sequences (NODE_TYPE = 13) that have the model root node as parent (PARENT_UNIQUE_NAME = 0). The FLATTENED keyword makes the results easier to read.Sample result of this query is shown in the next figure.
  • 7. DMX Queriesyou reference the value returned for NODE_UNIQUE_NAME to get the ID of the node that contains all sequences for the model. You pass this value to the query as the ID of the parent node, to get only the transitions included in this node, which happens to contain a list of al sequences for the model.
  • 8. Interpreting the sequence clustering modelA sequence clustering model has a single parent node that represents the model and its metadata. The parent node, which is labeled, has a related sequence node that lists all the transitions that were detected in the training data.The algorithm also creates a number of clusters, based on the transitions that were found in the data and any other input attributes included when creating the model. Each cluster contains its own sequence node that lists only the transitions that were used in generating that specific cluster.
  • 9. Interpreting the sequence clustering model
  • 10. Microsoft Sequence Clustering Algorithm PrinciplesThe Microsoft Sequence Clustering algorithm is a hybrid algorithm that combines clustering techniques with Markov chain analysis to identify clusters and their sequences.This data typically represents a series of events or transitions between states in a dataset. The algorithm examines all transition probabilities and measures the differences, or distances, between all the possible sequences in the dataset to determine which sequences are the best to use as inputs for clustering. After the algorithm has created the list of candidate sequences, it uses the sequence information as an input for the EM method of clustering.
  • 11. Markov chain modelA Markov chain also contains a matrix of transition probabilities. The transitions emanating from a given state define a distribution over the possible next states. The equation P (xi= G|xi-1=A) = 0.15 means that, given the current state A, the probability of the next state being G is 0.15.
  • 12. Markov chain modelBased on the Markov chain, for any given length L sequence x {x1, x2,x3,. . .,xL}, you can calculate the probability of a sequence as follows:P(x) = P(xL . xL-1,. . .,x1) = P(xL| xL-1,. . .,x1)P (xL-1|xL-2,. . .,x1).. .P(x1)In first-order, the probability of each state xi depends only on the state of xi-1.P(x) = P(xL . xL-1,. . .,x1) = P(xL|xL-1)P(xL-1|xL-2). . .P(x2|x1)P(x1)
  • 13. Microsoft Sequence Clustering ParametersCLUSTER_COUNTspecifies the approximate number of clusters to be built by the algorithm. Setting the CLUSTER_COUNT parameter to 0 causes the algorithm to use heuristics to best determine the number of clusters to build.The default is 10.MAXIMUM_STATESspecifies the maximum number of states for a non-sequence attribute that the algorithm supports. The default is 100.
  • 14. Microsoft Sequence Clustering ParametersMINIMUM_SUPPORTspecifies the minimum number of cases that is required in support of an attribute to create a cluster.The default is 10.MAXIMUM_SEQUENCE_STATES specifies the maximum number of states that a sequence can have.The default is 64.
  • 15. Introduction to Microsoft Association RulesThe Microsoft Association Rules Viewer in Microsoft SQL Server Analysis Services displays mining models that are built with the Microsoft Association algorithm.The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The Microsoft Association algorithm is also useful for market basket analysis.
  • 16. Structure of an Association ModelThe top level has a single node (Model Root) that represents the model. The second level contains nodes that represent qualified item sets and rules.
  • 17. Association Algorithm PrinciplesThe Microsoft Association Rules algorithm belongs to the Apriori association family. The two steps in the Microsoft Association Rules algorithm are:calculation-intensive phase, is to find frequent item sets.
  • 18. Generate association rules based on frequent item sets. Association Algorithm ParametersMINIMUM_SUPPORT is the minimum support found for a frequent itemset.Its value is within the range of 0 to 1.MAXIMUM_SUPPORT is the maximum support found for a frequent itemset.Its value is within the range of 0 to 1. The default value is 0.03.
  • 19. Association Algorithm ParametersMINIMUM_PROBABILITY is a threshold parameter. It defines the minimum probability for an association rule. Its value is within the range of 0 to 1. The default value is 0.4.MINIMUM_IMPORTANCE is a threshold parameter for association rules. Rules with importance less than Minimum_Importance are filtered out.
  • 20. Association Algorithm ParametersMAXIMUM_ITEMSET_SIZE specifies the maximum size of an itemset. The default value is 0, which means that there is no size limit on the itemset.MINIMUM_ITEMSET_SIZE specifies the minimum size of the itemset. The default value is 0.MAXIMUM_ITEMSET_COUNTdefines the maximum number of item sets.
  • 21. Association Algorithm ParametersOPTIMIZED_PREDICTION_COUNTdefines the number of items to be cached to optimized predictionsAUTODETECT_MINIMUM_SUPPORTrepresents the sensitivity of the algorithm used to autodetect minimum support.To automatically detect the smallest appropriate value of minimum support, Set this value to 1.0 .To turns off autodetection, Set this value to 1.0
  • 22. SummaryIntroduction to sequence clusteringDMX QueriesThe sequence clustering modelMicrosoft Sequence Clustering Algorithm Principles and ParametersMarkov chain modelIntroduction to Microsoft Association RulesAssociation Algorithm Principles and Parameters
  • 23. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net