0% found this document useful (0 votes)

51 views

Data Mining Tutorial0

This document provides an introduction and overview of the Basic Data Mining Tutorial. The tutorial teaches how to use SQL Server 2008 Analysis Services to create data mining models for a targeted marketing campaign. It includes lessons on preparing the database, building models using decision trees, clustering, and Naive Bayes algorithms, and exploring and testing the models. The overall goal is to analyze customer purchase data to identify potential buyers and target a mailing campaign.

Uploaded by

subbaraju123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Data Mining Tutorial0

Uploaded by

subbaraju123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Basic Data Mining Tutorial

Welcome to the Microsoft SQL Server 2008 Analysis Services (SSAS) Basic Data Mining Tutorial. Microsoft SQL
Server provides an integrated environment for creating and working with data mining models. In this Basic Data
Mining Tutorial, you will complete a scenario for a targeted mailing campaign in which you create three models for
analyzing customer purchasing behavior and targeting potential buyers. The tutorial demonstrates how to use the
data mining algorithms, mining model viewers, and data mining tools that are included in Microsoft SQL Server
Analysis Services. The fictitious company, Adventure Works Cycles, is used for all examples.
When you are comfortable using the data mining tools, we recommend that you also complete the Intermediate
Data Mining Tutorial, which demonstrates how to use forecasting, market basket analysis, time series, association
models, nested tables, and sequence clustering.
Tutorial Scenario
In this tutorial, you are an employee of Adventure Works Cycles who has been tasked with learning more about the
company's customers based on historical purchases, and then using that historical data to make predictions that
can be used in marketing. The company has never done data mining before, so you must create a new database
specifically for data mining and set up several data mining models.
What You Will Learn
This tutorial teaches you how to create and work with several different types of data mining models. It also teaches
you how to create a copy of a mining model, and apply a filter to the mining model. You then process the new
model and evaluate the model using a lift chart. After the model is complete, you use drillthrough to retrieve
additional data from the underlying mining structure.
In SQL Server 2008, Microsoft provides several new features that help you develop custom data mining models and
use the results more effectively.

Holdout Test Sets - When you create a mining structure, you can now divide the data in the mining
structure into training and testing sets.

Mining model filters - You can now attach filters to a mining model, and apply the filter during both
training and testing.

Drillthrough to Structure Cases and Structure Columns - You can now easily move from the general
patterns in the mining model to actionable detail in the data source.

This tutorial is divided into the following lessons:

Lesson 1: Preparing the Analysis Services Database (Basic Data Mining Tutorial)
In this lesson, you will learn how to create a new Analysis Services database, add a data source and data
source view, and prepare the new database to be used with data mining.
Lesson 2: Building a Targeted Mailing Structure (Basic Data Mining Tutorial)
In this lesson, you will learn how to create a mining model structure that can be used as part of a targeted
mailing scenario.
Lesson 3: Adding and Processing Models
In this lesson you will learn how to add models to a structure. The models you create are built with the
following algorithms:

Microsoft Decision Trees

Microsoft Clustering

Microsoft Naive Bayes

Lesson 4: Exploring the Targeted Mailing Models (Basic Data Mining Tutorial)
In this lesson you will learn how to explore and interpret the findings of each model using the Viewers.
Lesson 5: Testing Models (Basic Data Mining Tutorial)
In this lesson, you make a copy of one of the targeted mailing models, add a mining model filter to restrict
the training data to a particular set of customers, and then assess the viability of the model.
Lesson 6: Creating and Working with Predictions (Basic Data Mining Tutorial)
In this final lesson of the Basic Data Mining Tutorial, you use the model to predict which customers are
most likely to purchase a bike. You then drill through to the underlying cases to obtain contact
information.

Requirements
Make sure that the following are installed:

Microsoft SQL Server 2008

Microsoft SQL Server Analysis Services

The AdventureWorks2008 database.

To enhance security, the sample databases are not installed with SQL Server 2008. To install the official databases
for Microsoft SQL Server, visit Microsoft SQL Sample Databases8c4c2af8-f2ed-4559-b5d2-984a9f7ce5ca.

Intermediate Data Mining Tutorials

(Analysis Services - Data Mining)
Microsoft Analysis Services provides an integrated environment for creating and working with data mining models.
You can easily bind to data sources, create and test multiple models on the same data, and deploy models for use
in predictive analysis.
In the Basic Data Mining Tutorial, you learned how to use Business Intelligence Development Studio to create a
data mining solution, and you built three models to support a targeted mailing campaign for analyzing customer
purchasing behavior and for targeting potential buyers.
In this tutorial, you are expected to be familiar with the data mining tools and with the mining model viewers that
were introduced in the Basic Data Mining Tutorial. This intermediate tutorial builds on that experience and
introduces several new scenarios, including forecasting and market basket analysis. In this tutorial, you will learn
how to create a time series model, an association model, and a sequence clustering model. You will also learn how
to use nested tables in a model, and how to create filters on nested tables.
All scenarios use the AdventureWorksDW2008 data source, but you will create different data source views for
different scenarios. You can do the lessons in any order as long as you create the data source first.
The lessons are independent and can be completed separately.
Lesson Scenarios
After your success with the targeted mailing campaign, you have been asked to apply your knowledge of data
mining to develop several new models for use in business planning. These include the following new model types:

Time series models, to forecast the sales of products in different regions around the world. You will
develop individual models for each region and also a general model that can be used for cross-prediction.

Association model, to analyze groupings of products that are purchased during visits to the Adventure
Works Cycles e-commerce site. Based on this market basket model, you might recommend products to
customers.

Sequence clustering model, to analyze the order in which customers buy products. Based on this model,
you can plan changes in Web site design or new product offerings.

What You Will Learn

This tutorial teaches you how to create and work with several types of data mining algorithms. This tutorial also
introduces the following concepts:

Using nested tables to build models

Choosing a nested table key, time series key, or sequence key

Filtering nested tables when creating models or making predictions

Determining whether you have enough data to support a model

Creating a general model and applying it to multiple data sets

This tutorial is divided into the following lessons:

Lesson 1: Creating the Intermediate Data Mining Solution (Intermediate Data Mining Tutorial)

In this lesson, you will create a new project based on the AdventureWorksDW 2008 database, to
support several new data sources views and many more mining models.
Lesson 2: Building a Forecasting Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will create a mining model that can be used as part of a forecasting scenario. You will
also explore mining models that are built with the Microsoft Time Series algorithm.
You will build models for individual regions, and then build a general model that can be used for crossprediction.
Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will add a new data source view and learn how to work with nested tables and keys.
Based on this data, you will create a mining model that can be used as part of a market basket scenario.
You will also explore mining models that are built with the Microsoft Association algorithm.
Lesson 4: Building a Sequence Clustering Scenario (Intermediate Data Mining Tutorial)
In this lesson, you will create a mining model that can be used as part of a sequence clustering scenario.
You will also learn how to explore mining models that are built with the Microsoft Sequence Clustering
algorithm.
Lesson 5: Building Neural Network and Logistic Regression Models (Intermediate Data Mining Tutorial)
In this lesson, you will analyze the data from a call center to help improve their customer satisfaction. You
will use the Microsoft neural network algorithm to build a mining model to help you understand the data
and the trends in it. You will also build a logistic regression model to make predictions that can be used in
business planning.
Requirements
Make sure that the following are installed:

Microsoft SQL Server 2008

Microsoft SQL Server Analysis Services

SQL Server with the AdventureWorks DW2008 database.

By default, the sample databases are not installed, to enhance security. To install the official databases for
Microsoft SQL Server, visit the Microsoft SQL Sample Databases page and select AdventureWorksDW2008.

Data Mining Projects

(Analysis Services - Data Mining)
When you develop a data mining solution in Analysis Services, you first create an Analysis Services project. Within
this project, you define the source of data that you will use for analysis, and then set up a model that includes an
algorithm and custom instructions for handling the data. You can also continue to test and refine the model within
the project. When you are satisfied with the solution, you can deploy it to another server or use it in an application
to provide predictions and analysis.
The following sections outline the tools and processes for creating a data mining solution, and provide links to
resources to use for each step.
Creating an Analysis Services Project
When you develop a data mining solution, you must first create a new Analysis Services project by using Business
Intelligence Development Studio. Each data mining project contains the following four kinds of objects: data
sources; data source views, which are based on the data sources; mining structures, which define how the data is
used in the model; and mining models, which create and store patterns.
For More Information: Defining an Analysis Services Project, Defining a Data Source Using the Data Source
Wizard (Analysis Services)
Defining a Data Source
The data source defines the connection string and authentication information that the Analysis Services server will
use to connect to the data source. The data source can contain multiple tables or views. Analysis Services can use
datasets from both relational and Online Analytical Processing (OLAP) databases, or from external providers.
After you have defined this connection to a data source, you create a view that identifies the specific data that is
relevant to your model. The data source view also enables you to customize the way that the data in the data
source is supplied to the mining model. You can modify the structure of the data to make it more relevant to your

project, or choose only certain kinds of data. If you want to filter the data, you can do so in the data source view,
or in filters that are applied at the level of the model.
The requirements for how much data you will need, and how that data should be cleaned and formatted, will differ
depending on the algorithm that you use to investigate that data.
For More Information: Defining a Data Source View (Analysis Services)
Adding Mining Structures to an Analysis Services Project
Once you have enough data to begin analysis, you select the columns of data that are most relevant to your
business problem, and add mining structures to the project. A mining structure defines the columns of data, and
columns with nested tables, that are obtained from the data source view or from an OLAP cube in the project.
To add a new mining structure, you start the Data Mining Wizard, which walks you through the process of defining
the data and optionally creating an initial data mining model. When you create a structure, you can also partition
your data to include a training data set, used for building models, and a testing data set, which can be used to test
or validate all mining models that are based on that structure. You can use the Mining Structure tab of Data
Mining Designer to modify existing mining structures, including adding columns and nested tables.
For More Information: Creating a New Mining Structure, Data Mining DesignerData Mining Wizard (Analysis
Services - Data Mining)
Working with Data Mining Models
To each mining structure, you add one or more mining models. The mining model defines the algorithm, or the
method of analysis that you will use on the data. You process each model by running the data in the data source
view through the algorithm, which generates a mathematical model of the data. This process is also known as
training the model.
After the model has been processed, you can then visually explore the mining model and create prediction queries
against it.
Analysis Services provides several options for processing mining model objects, including the ability to control
which objects are processed and how they are processed. For example, you can process a structure and cache the
data, and then continue to add new models to the structure. If the data is cached, you can use drillthrough queries
to return detailed information about the cases used in the model.
For More Information: Data Mining Algorithms (Analysis Services - Data Mining), Processing Analysis Services
Objects, Using Drillthrough on Mining Models and Mining Structures (Analysis Services - Data Mining).
Validating Data Mining Models
After you have created a model, you can investigate the results and make decisions about which models perform
the best. On the Mining Model Viewer tab in Data Mining Designer, Analysis Services provides viewers for each
mining model type, which you can use to explore the mining models.
In the Mining Accuracy Chart tab of the designer, Analysis Services provides tools that you can use to directly
compare mining models and choose the most accurate or useful mining model. These tools include a lift chart,
profit chart, and a classification matrix.
You can also use the cross-validation report, new in SQL Server 2008, to perform iterative subsampling of your
data to determine whether the model is biased to a particular set of data. The statistics that the report provides
can be used to objectively compare models and assess the quality of your training data.
For More Information: Viewing a Data Mining Model, Validating Data Mining Models (Analysis Services - Data
Mining)
Creating Predictions
The main goal of most data mining projects is to use a mining model to create predictions. After you explore and
compare mining models, you can use one of several tools to create predictions. Analysis Services provides a query
language called Data Mining Extensions (DMX) that is the basis for creating predictions and is easily scriptable. To
help you build DMX prediction queries, SQL Server provides a query builder, available in SQL Server Management
Studio and Business Intelligence Development Studio, and DMX templates for the query editor in Management
Studio. Within BI Development Studio, you access the query builder from the Mining Model Prediction tab of
Data Mining Designer.
For More Information: Creating DMX Prediction Queries, Data Mining Extensions (DMX) Statement Reference
SQL Server Management Studio
After you have used BI Development Studio to build mining models for your data mining project, you can manage
and work with the models and create predictions in Management Studio. By using the query tools in SQL Server
Management Studio, you can explore the data in your models, create complex content queries, or manage data
mining objects stored in an instance of SQL Server.
For More Information: Data Mining in SQL Server Management Studio

SQL Server Reporting Services

After you create a mining model, you may want to distribute the results to a wider audience. Because the results of
data mining are stored in a consistent schema that is readily accessible via database queries, you can use a variety
of client tools to present the results of analysis, to explore the patterns in the model, or to make predictions.
You can use Report Designer in Microsoft SQL Server Reporting Services to create reports, which you can use to
present the information that a mining model contains. You can use the result of any DMX query as the basis of a
report, and can take advantage of the parameterization and formatting features that are available in Reporting
Services.
For More Information: Using the Analysis Services DMX Query Designer (Reporting Services), Integrating
Reporting Services into Applications
Working Programmatically with Data Mining
Analysis Services provides several tools that you can use to programmatically work with data mining. The DMX
language provides statements that you can use to create, train, and use data mining models. You can also perform
these tasks by using a combination of XML for Analysis (XMLA) and Analysis Services Scripting Language (ASSL),
or by using Analysis Management Objects (AMO).
You can access all the metadata that is associated with data mining by using data mining schema rowsets. For
example, you can use schema rowsets to determine the data types that an algorithm supports, or the model
names that exist in a database.
For More Information: Data Mining Extensions (DMX) Reference, Data Mining Schema Rowsets, Using XML for
Analysis in Analysis Services (XMLA)

INDEPENDANTS and LOCAL OPERATORS
No ratings yet
INDEPENDANTS and LOCAL OPERATORS
8 pages
Basic Data Mining Tutorial
No ratings yet
Basic Data Mining Tutorial
35 pages
SQL Server 2012 Tutorials - Analysis Services Data Mining
No ratings yet
SQL Server 2012 Tutorials - Analysis Services Data Mining
215 pages
Data Mining
No ratings yet
Data Mining
26 pages
Paper 6: Management Information System Module 20: Data Mining For Decision Support
No ratings yet
Paper 6: Management Information System Module 20: Data Mining For Decision Support
16 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
Data Mining Using SQL Server Analysis Server
No ratings yet
Data Mining Using SQL Server Analysis Server
4 pages
Data Mining
No ratings yet
Data Mining
52 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Chap 1
No ratings yet
Chap 1
45 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Lecture 1 Introduction Updated (1)
No ratings yet
Lecture 1 Introduction Updated (1)
46 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
BI Lecture 5ppt
No ratings yet
BI Lecture 5ppt
18 pages
Chapter 5- Data Mining
No ratings yet
Chapter 5- Data Mining
29 pages
DMand Vis
No ratings yet
DMand Vis
49 pages
Data Mining
No ratings yet
Data Mining
6 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining
100% (3)
Data Mining
18 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
BIDM
No ratings yet
BIDM
48 pages
Data Mining Information
No ratings yet
Data Mining Information
7 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Data Mining
No ratings yet
Data Mining
12 pages
Chapter 4 SR2023
No ratings yet
Chapter 4 SR2023
58 pages
206 Datamining
No ratings yet
206 Datamining
109 pages
Intro Data Mining
100% (1)
Intro Data Mining
87 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
69 pages
Busiess Analytics Data Mining Lecture 3
No ratings yet
Busiess Analytics Data Mining Lecture 3
52 pages
SQL Server 2008 For Business Intelligence: UTS Short Course
No ratings yet
SQL Server 2008 For Business Intelligence: UTS Short Course
43 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
SSAS - Data Mining
No ratings yet
SSAS - Data Mining
4 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
data mining
No ratings yet
data mining
17 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
ModelQB - Part B&C-1
No ratings yet
ModelQB - Part B&C-1
51 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
Lecture 7 8 Data Mining
No ratings yet
Lecture 7 8 Data Mining
23 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Module 3
No ratings yet
Module 3
187 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
12 pages
unit-5-dwdm
No ratings yet
unit-5-dwdm
42 pages
Data Mine
No ratings yet
Data Mine
14 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
12 pages
DataMining Process 17.03.12
No ratings yet
DataMining Process 17.03.12
24 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Creating your MySQL Database: Practical Design Tips and Techniques
From Everand
Creating your MySQL Database: Practical Design Tips and Techniques
Marc Delisle
3/5 (1)
Learn Hypnosis About Hypnosis
No ratings yet
Learn Hypnosis About Hypnosis
2 pages
Good Topics For A Problem Solution Essay
100% (2)
Good Topics For A Problem Solution Essay
8 pages
02 Intro To Schlumberger CT
100% (1)
02 Intro To Schlumberger CT
31 pages
MSDS - Industrial Salt - 01 08 2017
No ratings yet
MSDS - Industrial Salt - 01 08 2017
1 page
Low Reynolds Number
No ratings yet
Low Reynolds Number
10 pages
1st Unit
No ratings yet
1st Unit
10 pages
NIA Regional Secretariat
No ratings yet
NIA Regional Secretariat
62 pages
Lab p2
No ratings yet
Lab p2
9 pages
BTHL 201 Manual
No ratings yet
BTHL 201 Manual
19 pages
Manual Service Hitachi Ex1000 PDF
No ratings yet
Manual Service Hitachi Ex1000 PDF
370 pages
S. N o - Examinatio N Batch Date Syllabus Timings Venue: April 20, 2023
No ratings yet
S. N o - Examinatio N Batch Date Syllabus Timings Venue: April 20, 2023
1 page
TSW1506 2025
No ratings yet
TSW1506 2025
25 pages
BIM Based Quantity Takeoff and Cost Estimating - Tocoman
No ratings yet
BIM Based Quantity Takeoff and Cost Estimating - Tocoman
1 page
Oziq Ovqat Mahsulotlarini Sifatini Aniqlash Va Tasniflash Usullari
No ratings yet
Oziq Ovqat Mahsulotlarini Sifatini Aniqlash Va Tasniflash Usullari
5 pages
Wafer-Level Integration of An Advanced Logic-Memory System Through The Second-Generation CoWoS Technology
No ratings yet
Wafer-Level Integration of An Advanced Logic-Memory System Through The Second-Generation CoWoS Technology
7 pages
Case Study
No ratings yet
Case Study
3 pages
Rain Alarm
No ratings yet
Rain Alarm
3 pages
LAB 7 - Vane Shear - LEVEL 3
100% (1)
LAB 7 - Vane Shear - LEVEL 3
5 pages
Selenium Cucumber Interview Ques
No ratings yet
Selenium Cucumber Interview Ques
12 pages
5 Perception: Consumer Behavior, 11E
No ratings yet
5 Perception: Consumer Behavior, 11E
35 pages
Roll No Sheet For 10th
No ratings yet
Roll No Sheet For 10th
5 pages
Time Out - Canadian Centre For Policy Alternatives Report
No ratings yet
Time Out - Canadian Centre For Policy Alternatives Report
31 pages
SPEED v.11.04: Release Notes
No ratings yet
SPEED v.11.04: Release Notes
19 pages
Amrita Vidyalayam Mata
No ratings yet
Amrita Vidyalayam Mata
9 pages
EDU 564 ADD REVIEWER
No ratings yet
EDU 564 ADD REVIEWER
23 pages
2007 - Erzen - Islamic Aesthetics
No ratings yet
2007 - Erzen - Islamic Aesthetics
7 pages
University of Dhaka Department of Marketing EMBA Program Course Outline
No ratings yet
University of Dhaka Department of Marketing EMBA Program Course Outline
8 pages
Using Observation To Evaluate Extension Programs
No ratings yet
Using Observation To Evaluate Extension Programs
34 pages
Pteridophytes, Gymnosperms & Palaeobotany
95% (19)
Pteridophytes, Gymnosperms & Palaeobotany
78 pages

Data Mining Tutorial0

Uploaded by

Data Mining Tutorial0

Uploaded by

Basic Data Mining Tutorial

This tutorial is divided into the following lessons:

Microsoft Decision Trees

Microsoft Naive Bayes

Microsoft SQL Server 2008

Microsoft SQL Server Analysis Services

The AdventureWorks2008 database.

Intermediate Data Mining Tutorials

What You Will Learn

Using nested tables to build models

Choosing a nested table key, time series key, or sequence key

Filtering nested tables when creating models or making predictions

Determining whether you have enough data to support a model

Creating a general model and applying it to multiple data sets

This tutorial is divided into the following lessons:

Microsoft SQL Server 2008

Microsoft SQL Server Analysis Services

SQL Server with the AdventureWorks DW2008 database.

Data Mining Projects

SQL Server Reporting Services

You might also like