0% found this document useful (0 votes)
40 views

Data Mining Tutorial0

This document provides an introduction and overview of the Basic Data Mining Tutorial. The tutorial teaches how to use SQL Server 2008 Analysis Services to create data mining models for a targeted marketing campaign. It includes lessons on preparing the database, building models using decision trees, clustering, and Naive Bayes algorithms, and exploring and testing the models. The overall goal is to analyze customer purchase data to identify potential buyers and target a mailing campaign.

Uploaded by

subbaraju123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Data Mining Tutorial0

This document provides an introduction and overview of the Basic Data Mining Tutorial. The tutorial teaches how to use SQL Server 2008 Analysis Services to create data mining models for a targeted marketing campaign. It includes lessons on preparing the database, building models using decision trees, clustering, and Naive Bayes algorithms, and exploring and testing the models. The overall goal is to analyze customer purchase data to identify potential buyers and target a mailing campaign.

Uploaded by

subbaraju123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Basic Data Mining Tutorial

Welcome to the Microsoft SQL Server 2008 Analysis Services (SSAS) Basic Data Mining Tutorial. Microsoft SQL
Server provides an integrated environment for creating and working with data mining models. In this Basic Data
Mining Tutorial, you will complete a scenario for a targeted mailing campaign in which you create three models for
analyzing customer purchasing behavior and targeting potential buyers. The tutorial demonstrates how to use the
data mining algorithms, mining model viewers, and data mining tools that are included in Microsoft SQL Server
Analysis Services. The fictitious company, Adventure Works Cycles, is used for all examples.
When you are comfortable using the data mining tools, we recommend that you also complete the Intermediate
Data Mining Tutorial, which demonstrates how to use forecasting, market basket analysis, time series, association
models, nested tables, and sequence clustering.
Tutorial Scenario
In this tutorial, you are an employee of Adventure Works Cycles who has been tasked with learning more about the
company's customers based on historical purchases, and then using that historical data to make predictions that
can be used in marketing. The company has never done data mining before, so you must create a new database
specifically for data mining and set up several data mining models.
What You Will Learn
This tutorial teaches you how to create and work with several different types of data mining models. It also teaches
you how to create a copy of a mining model, and apply a filter to the mining model. You then process the new
model and evaluate the model using a lift chart. After the model is complete, you use drillthrough to retrieve
additional data from the underlying mining structure.
In SQL Server 2008, Microsoft provides several new features that help you develop custom data mining models and
use the results more effectively.

Holdout Test Sets - When you create a mining structure, you can now divide the data in the mining
structure into training and testing sets.

Mining model filters - You can now attach filters to a mining model, and apply the filter during both
training and testing.

Drillthrough to Structure Cases and Structure Columns - You can now easily move from the general
patterns in the mining model to actionable detail in the data source.

This tutorial is divided into the following lessons:


Lesson 1: Preparing the Analysis Services Database (Basic Data Mining Tutorial)
In this lesson, you will learn how to create a new Analysis Services database, add a data source and data
source view, and prepare the new database to be used with data mining.
Lesson 2: Building a Targeted Mailing Structure (Basic Data Mining Tutorial)
In this lesson, you will learn how to create a mining model structure that can be used as part of a targeted
mailing scenario.
Lesson 3: Adding and Processing Models
In this lesson you will learn how to add models to a structure. The models you create are built with the
following algorithms:

Microsoft Decision Trees

Microsoft Clustering

Microsoft Naive Bayes

Lesson 4: Exploring the Targeted Mailing Models (Basic Data Mining Tutorial)
In this lesson you will learn how to explore and interpret the findings of each model using the Viewers.
Lesson 5: Testing Models (Basic Data Mining Tutorial)
In this lesson, you make a copy of one of the targeted mailing models, add a mining model filter to restrict
the training data to a particular set of customers, and then assess the viability of the model.
Lesson 6: Creating and Working with Predictions (Basic Data Mining Tutorial)
In this final lesson of the Basic Data Mining Tutorial, you use the model to predict which customers are
most likely to purchase a bike. You then drill through to the underlying cases to obtain contact
information.

Requirements
Make sure that the following are installed:

Microsoft SQL Server 2008

Microsoft SQL Server Analysis Services

The AdventureWorks2008 database.

To enhance security, the sample databases are not installed with SQL Server 2008. To install the official databases
for Microsoft SQL Server, visit Microsoft SQL Sample Databases8c4c2af8-f2ed-4559-b5d2-984a9f7ce5ca.

Intermediate Data Mining Tutorials


(Analysis Services - Data Mining)
Microsoft Analysis Services provides an integrated environment for creating and working with data mining models.
You can easily bind to data sources, create and test multiple models on the same data, and deploy models for use
in predictive analysis.
In the Basic Data Mining Tutorial, you learned how to use Business Intelligence Development Studio to create a
data mining solution, and you built three models to support a targeted mailing campaign for analyzing customer
purchasing behavior and for targeting potential buyers.
In this tutorial, you are expected to be familiar with the data mining tools and with the mining model viewers that
were introduced in the Basic Data Mining Tutorial. This intermediate tutorial builds on that experience and
introduces several new scenarios, including forecasting and market basket analysis. In this tutorial, you will learn
how to create a time series model, an association model, and a sequence clustering model. You will also learn how
to use nested tables in a model, and how to create filters on nested tables.
All scenarios use the AdventureWorksDW2008 data source, but you will create different data source views for
different scenarios. You can do the lessons in any order as long as you create the data source first.
The lessons are independent and can be completed separately.
Lesson Scenarios
After your success with the targeted mailing campaign, you have been asked to apply your knowledge of data
mining to develop several new models for use in business planning. These include the following new model types:

Time series models, to forecast the sales of products in different regions around the world. You will
develop individual models for each region and also a general model that can be used for cross-prediction.

Association model, to analyze groupings of products that are purchased during visits to the Adventure
Works Cycles e-commerce site. Based on this market basket model, you might recommend products to
customers.

Sequence clustering model, to analyze the order in which customers buy products. Based on this model,
you can plan changes in Web site design or new product offerings.

What You Will Learn


This tutorial teaches you how to create and work with several types of data mining algorithms. This tutorial also
introduces the following concepts:

Using nested tables to build models

Choosing a nested table key, time series key, or sequence key

Filtering nested tables when creating models or making predictions

Determining whether you have enough data to support a model

Creating a general model and applying it to multiple data sets

This tutorial is divided into the following lessons:


Lesson 1: Creating the Intermediate Data Mining Solution (Intermediate Data Mining Tutorial)

In this lesson, you will create a new project based on the AdventureWorksDW 2008 database, to
support several new data sources views and many more mining models.
Lesson 2: Building a Forecasting Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will create a mining model that can be used as part of a forecasting scenario. You will
also explore mining models that are built with the Microsoft Time Series algorithm.
You will build models for individual regions, and then build a general model that can be used for crossprediction.
Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will add a new data source view and learn how to work with nested tables and keys.
Based on this data, you will create a mining model that can be used as part of a market basket scenario.
You will also explore mining models that are built with the Microsoft Association algorithm.
Lesson 4: Building a Sequence Clustering Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will create a mining model that can be used as part of a sequence clustering scenario.
You will also learn how to explore mining models that are built with the Microsoft Sequence Clustering
algorithm.
Lesson 5: Building Neural Network and Logistic Regression Models (Intermediate Data Mining Tutorial)
In this lesson, you will analyze the data from a call center to help improve their customer satisfaction. You
will use the Microsoft neural network algorithm to build a mining model to help you understand the data
and the trends in it. You will also build a logistic regression model to make predictions that can be used in
business planning.
Requirements
Make sure that the following are installed:

Microsoft SQL Server 2008

Microsoft SQL Server Analysis Services

SQL Server with the AdventureWorks DW2008 database.

By default, the sample databases are not installed, to enhance security. To install the official databases for
Microsoft SQL Server, visit the Microsoft SQL Sample Databases page and select AdventureWorksDW2008.

Data Mining Projects


(Analysis Services - Data Mining)
When you develop a data mining solution in Analysis Services, you first create an Analysis Services project. Within
this project, you define the source of data that you will use for analysis, and then set up a model that includes an
algorithm and custom instructions for handling the data. You can also continue to test and refine the model within
the project. When you are satisfied with the solution, you can deploy it to another server or use it in an application
to provide predictions and analysis.
The following sections outline the tools and processes for creating a data mining solution, and provide links to
resources to use for each step.
Creating an Analysis Services Project
When you develop a data mining solution, you must first create a new Analysis Services project by using Business
Intelligence Development Studio. Each data mining project contains the following four kinds of objects: data
sources; data source views, which are based on the data sources; mining structures, which define how the data is
used in the model; and mining models, which create and store patterns.
For More Information: Defining an Analysis Services Project, Defining a Data Source Using the Data Source
Wizard (Analysis Services)
Defining a Data Source
The data source defines the connection string and authentication information that the Analysis Services server will
use to connect to the data source. The data source can contain multiple tables or views. Analysis Services can use
datasets from both relational and Online Analytical Processing (OLAP) databases, or from external providers.
After you have defined this connection to a data source, you create a view that identifies the specific data that is
relevant to your model. The data source view also enables you to customize the way that the data in the data
source is supplied to the mining model. You can modify the structure of the data to make it more relevant to your

project, or choose only certain kinds of data. If you want to filter the data, you can do so in the data source view,
or in filters that are applied at the level of the model.
The requirements for how much data you will need, and how that data should be cleaned and formatted, will differ
depending on the algorithm that you use to investigate that data.
For More Information: Defining a Data Source View (Analysis Services)
Adding Mining Structures to an Analysis Services Project
Once you have enough data to begin analysis, you select the columns of data that are most relevant to your
business problem, and add mining structures to the project. A mining structure defines the columns of data, and
columns with nested tables, that are obtained from the data source view or from an OLAP cube in the project.
To add a new mining structure, you start the Data Mining Wizard, which walks you through the process of defining
the data and optionally creating an initial data mining model. When you create a structure, you can also partition
your data to include a training data set, used for building models, and a testing data set, which can be used to test
or validate all mining models that are based on that structure. You can use the Mining Structure tab of Data
Mining Designer to modify existing mining structures, including adding columns and nested tables.
For More Information: Creating a New Mining Structure, Data Mining DesignerData Mining Wizard (Analysis
Services - Data Mining)
Working with Data Mining Models
To each mining structure, you add one or more mining models. The mining model defines the algorithm, or the
method of analysis that you will use on the data. You process each model by running the data in the data source
view through the algorithm, which generates a mathematical model of the data. This process is also known as
training the model.
After the model has been processed, you can then visually explore the mining model and create prediction queries
against it.
Analysis Services provides several options for processing mining model objects, including the ability to control
which objects are processed and how they are processed. For example, you can process a structure and cache the
data, and then continue to add new models to the structure. If the data is cached, you can use drillthrough queries
to return detailed information about the cases used in the model.
For More Information: Data Mining Algorithms (Analysis Services - Data Mining), Processing Analysis Services
Objects, Using Drillthrough on Mining Models and Mining Structures (Analysis Services - Data Mining).
Validating Data Mining Models
After you have created a model, you can investigate the results and make decisions about which models perform
the best. On the Mining Model Viewer tab in Data Mining Designer, Analysis Services provides viewers for each
mining model type, which you can use to explore the mining models.
In the Mining Accuracy Chart tab of the designer, Analysis Services provides tools that you can use to directly
compare mining models and choose the most accurate or useful mining model. These tools include a lift chart,
profit chart, and a classification matrix.
You can also use the cross-validation report, new in SQL Server 2008, to perform iterative subsampling of your
data to determine whether the model is biased to a particular set of data. The statistics that the report provides
can be used to objectively compare models and assess the quality of your training data.
For More Information: Viewing a Data Mining Model, Validating Data Mining Models (Analysis Services - Data
Mining)
Creating Predictions
The main goal of most data mining projects is to use a mining model to create predictions. After you explore and
compare mining models, you can use one of several tools to create predictions. Analysis Services provides a query
language called Data Mining Extensions (DMX) that is the basis for creating predictions and is easily scriptable. To
help you build DMX prediction queries, SQL Server provides a query builder, available in SQL Server Management
Studio and Business Intelligence Development Studio, and DMX templates for the query editor in Management
Studio. Within BI Development Studio, you access the query builder from the Mining Model Prediction tab of
Data Mining Designer.
For More Information: Creating DMX Prediction Queries, Data Mining Extensions (DMX) Statement Reference
SQL Server Management Studio
After you have used BI Development Studio to build mining models for your data mining project, you can manage
and work with the models and create predictions in Management Studio. By using the query tools in SQL Server
Management Studio, you can explore the data in your models, create complex content queries, or manage data
mining objects stored in an instance of SQL Server.
For More Information: Data Mining in SQL Server Management Studio

SQL Server Reporting Services


After you create a mining model, you may want to distribute the results to a wider audience. Because the results of
data mining are stored in a consistent schema that is readily accessible via database queries, you can use a variety
of client tools to present the results of analysis, to explore the patterns in the model, or to make predictions.
You can use Report Designer in Microsoft SQL Server Reporting Services to create reports, which you can use to
present the information that a mining model contains. You can use the result of any DMX query as the basis of a
report, and can take advantage of the parameterization and formatting features that are available in Reporting
Services.
For More Information: Using the Analysis Services DMX Query Designer (Reporting Services), Integrating
Reporting Services into Applications
Working Programmatically with Data Mining
Analysis Services provides several tools that you can use to programmatically work with data mining. The DMX
language provides statements that you can use to create, train, and use data mining models. You can also perform
these tasks by using a combination of XML for Analysis (XMLA) and Analysis Services Scripting Language (ASSL),
or by using Analysis Management Objects (AMO).
You can access all the metadata that is associated with data mining by using data mining schema rowsets. For
example, you can use schema rowsets to determine the data types that an algorithm supports, or the model
names that exist in a database.
For More Information: Data Mining Extensions (DMX) Reference, Data Mining Schema Rowsets, Using XML for
Analysis in Analysis Services (XMLA)

You might also like