Data Mining Tutorial0
Data Mining Tutorial0
Welcome to the Microsoft SQL Server 2008 Analysis Services (SSAS) Basic Data Mining Tutorial. Microsoft SQL
Server provides an integrated environment for creating and working with data mining models. In this Basic Data
Mining Tutorial, you will complete a scenario for a targeted mailing campaign in which you create three models for
analyzing customer purchasing behavior and targeting potential buyers. The tutorial demonstrates how to use the
data mining algorithms, mining model viewers, and data mining tools that are included in Microsoft SQL Server
Analysis Services. The fictitious company, Adventure Works Cycles, is used for all examples.
When you are comfortable using the data mining tools, we recommend that you also complete the Intermediate
Data Mining Tutorial, which demonstrates how to use forecasting, market basket analysis, time series, association
models, nested tables, and sequence clustering.
Tutorial Scenario
In this tutorial, you are an employee of Adventure Works Cycles who has been tasked with learning more about the
company's customers based on historical purchases, and then using that historical data to make predictions that
can be used in marketing. The company has never done data mining before, so you must create a new database
specifically for data mining and set up several data mining models.
What You Will Learn
This tutorial teaches you how to create and work with several different types of data mining models. It also teaches
you how to create a copy of a mining model, and apply a filter to the mining model. You then process the new
model and evaluate the model using a lift chart. After the model is complete, you use drillthrough to retrieve
additional data from the underlying mining structure.
In SQL Server 2008, Microsoft provides several new features that help you develop custom data mining models and
use the results more effectively.
Holdout Test Sets - When you create a mining structure, you can now divide the data in the mining
structure into training and testing sets.
Mining model filters - You can now attach filters to a mining model, and apply the filter during both
training and testing.
Drillthrough to Structure Cases and Structure Columns - You can now easily move from the general
patterns in the mining model to actionable detail in the data source.
Microsoft Clustering
Lesson 4: Exploring the Targeted Mailing Models (Basic Data Mining Tutorial)
In this lesson you will learn how to explore and interpret the findings of each model using the Viewers.
Lesson 5: Testing Models (Basic Data Mining Tutorial)
In this lesson, you make a copy of one of the targeted mailing models, add a mining model filter to restrict
the training data to a particular set of customers, and then assess the viability of the model.
Lesson 6: Creating and Working with Predictions (Basic Data Mining Tutorial)
In this final lesson of the Basic Data Mining Tutorial, you use the model to predict which customers are
most likely to purchase a bike. You then drill through to the underlying cases to obtain contact
information.
Requirements
Make sure that the following are installed:
To enhance security, the sample databases are not installed with SQL Server 2008. To install the official databases
for Microsoft SQL Server, visit Microsoft SQL Sample Databases8c4c2af8-f2ed-4559-b5d2-984a9f7ce5ca.
Time series models, to forecast the sales of products in different regions around the world. You will
develop individual models for each region and also a general model that can be used for cross-prediction.
Association model, to analyze groupings of products that are purchased during visits to the Adventure
Works Cycles e-commerce site. Based on this market basket model, you might recommend products to
customers.
Sequence clustering model, to analyze the order in which customers buy products. Based on this model,
you can plan changes in Web site design or new product offerings.
In this lesson, you will create a new project based on the AdventureWorksDW 2008 database, to
support several new data sources views and many more mining models.
Lesson 2: Building a Forecasting Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will create a mining model that can be used as part of a forecasting scenario. You will
also explore mining models that are built with the Microsoft Time Series algorithm.
You will build models for individual regions, and then build a general model that can be used for crossprediction.
Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will add a new data source view and learn how to work with nested tables and keys.
Based on this data, you will create a mining model that can be used as part of a market basket scenario.
You will also explore mining models that are built with the Microsoft Association algorithm.
Lesson 4: Building a Sequence Clustering Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will create a mining model that can be used as part of a sequence clustering scenario.
You will also learn how to explore mining models that are built with the Microsoft Sequence Clustering
algorithm.
Lesson 5: Building Neural Network and Logistic Regression Models (Intermediate Data Mining Tutorial)
In this lesson, you will analyze the data from a call center to help improve their customer satisfaction. You
will use the Microsoft neural network algorithm to build a mining model to help you understand the data
and the trends in it. You will also build a logistic regression model to make predictions that can be used in
business planning.
Requirements
Make sure that the following are installed:
By default, the sample databases are not installed, to enhance security. To install the official databases for
Microsoft SQL Server, visit the Microsoft SQL Sample Databases page and select AdventureWorksDW2008.
project, or choose only certain kinds of data. If you want to filter the data, you can do so in the data source view,
or in filters that are applied at the level of the model.
The requirements for how much data you will need, and how that data should be cleaned and formatted, will differ
depending on the algorithm that you use to investigate that data.
For More Information: Defining a Data Source View (Analysis Services)
Adding Mining Structures to an Analysis Services Project
Once you have enough data to begin analysis, you select the columns of data that are most relevant to your
business problem, and add mining structures to the project. A mining structure defines the columns of data, and
columns with nested tables, that are obtained from the data source view or from an OLAP cube in the project.
To add a new mining structure, you start the Data Mining Wizard, which walks you through the process of defining
the data and optionally creating an initial data mining model. When you create a structure, you can also partition
your data to include a training data set, used for building models, and a testing data set, which can be used to test
or validate all mining models that are based on that structure. You can use the Mining Structure tab of Data
Mining Designer to modify existing mining structures, including adding columns and nested tables.
For More Information: Creating a New Mining Structure, Data Mining DesignerData Mining Wizard (Analysis
Services - Data Mining)
Working with Data Mining Models
To each mining structure, you add one or more mining models. The mining model defines the algorithm, or the
method of analysis that you will use on the data. You process each model by running the data in the data source
view through the algorithm, which generates a mathematical model of the data. This process is also known as
training the model.
After the model has been processed, you can then visually explore the mining model and create prediction queries
against it.
Analysis Services provides several options for processing mining model objects, including the ability to control
which objects are processed and how they are processed. For example, you can process a structure and cache the
data, and then continue to add new models to the structure. If the data is cached, you can use drillthrough queries
to return detailed information about the cases used in the model.
For More Information: Data Mining Algorithms (Analysis Services - Data Mining), Processing Analysis Services
Objects, Using Drillthrough on Mining Models and Mining Structures (Analysis Services - Data Mining).
Validating Data Mining Models
After you have created a model, you can investigate the results and make decisions about which models perform
the best. On the Mining Model Viewer tab in Data Mining Designer, Analysis Services provides viewers for each
mining model type, which you can use to explore the mining models.
In the Mining Accuracy Chart tab of the designer, Analysis Services provides tools that you can use to directly
compare mining models and choose the most accurate or useful mining model. These tools include a lift chart,
profit chart, and a classification matrix.
You can also use the cross-validation report, new in SQL Server 2008, to perform iterative subsampling of your
data to determine whether the model is biased to a particular set of data. The statistics that the report provides
can be used to objectively compare models and assess the quality of your training data.
For More Information: Viewing a Data Mining Model, Validating Data Mining Models (Analysis Services - Data
Mining)
Creating Predictions
The main goal of most data mining projects is to use a mining model to create predictions. After you explore and
compare mining models, you can use one of several tools to create predictions. Analysis Services provides a query
language called Data Mining Extensions (DMX) that is the basis for creating predictions and is easily scriptable. To
help you build DMX prediction queries, SQL Server provides a query builder, available in SQL Server Management
Studio and Business Intelligence Development Studio, and DMX templates for the query editor in Management
Studio. Within BI Development Studio, you access the query builder from the Mining Model Prediction tab of
Data Mining Designer.
For More Information: Creating DMX Prediction Queries, Data Mining Extensions (DMX) Statement Reference
SQL Server Management Studio
After you have used BI Development Studio to build mining models for your data mining project, you can manage
and work with the models and create predictions in Management Studio. By using the query tools in SQL Server
Management Studio, you can explore the data in your models, create complex content queries, or manage data
mining objects stored in an instance of SQL Server.
For More Information: Data Mining in SQL Server Management Studio