Solution DWDM
Solution DWDM
14
Contents
1. Abstract: ................................................................................................................................................ 3
2. Introduction:.......................................................................................................................................... 3
3. Adventure works database: ................................................................................................................... 4
4. Getting familiar with Sql Server Analysis Services (SSAS) tools and various datamining algorithms 5
4.1. Microsoft Association Algorithm .................................................................................................. 6
4.2. Microsoft Clustering Algorithm.................................................................................................... 9
4.3. Microsoft Time Series Algorithm ................................................................................................ 12
4.4. Microsoft Decision Trees Algorithm .......................................................................................... 16
5. Star schema: ........................................................................................................................................ 19
5.1. Fact Tables and Dimension Tables .................................................................................................. 19
5.2. Steps in Star Schema Design: .......................................................................................................... 19
6. Cube AND MDX: ............................................................................................................................... 19
6.1. Cube: ........................................................................................................................................... 19
6.2. Dimensions: ................................................................................................................................ 20
6.3. Measure & Measure Groups ....................................................................................................... 20
6.4. Steps in Building and Deploying a cube: .................................................................................... 20
7. Quering On Cubes Using MDX-Examples ......................................................................................... 26
7.1. Queries: ....................................................................................................................................... 26
7.2. Screenshots: ................................................................................................................................ 27
8. DataMining and DMX ....................................................................................................................... 32
8.1. Business Scenario1: .................................................................................................................... 32
8.2. Business Scenario2: .................................................................................................................... 36
1. Abstract:
The purpose of this project is learn the application of various database and datamining concpets and its
application in current business using Adventure works database and various database tools.We begin by
designing a star schema and building a DataWarehouse OLAP cube, for Sales Analysis using Sql Server
Analysis Services (SSAS) and query the cube using MultiDimensional eXpressions language(MDX) to
find the current business trends.Next we create data mining structure using Data Mining
Extensions(DMX) and find the hidden pattern and predict things that are helpful in improving the
business and its growth. Some of the data mining techniques that will be considering to accomplish the
above tasks are Microsoft Association Mining, Cluster analysis, Time series and Naïve- Bayes Algorithm.
2. Introduction:
A data warehouse is a centralized repository that stores data from multiple information sources and
transforms them into a common, multidimensional data model for efficient querying and analysis.OLAP
and Data Mining are two complementary technologies for Business Intelligence.Online Analytical
Processing (OLAP) is a technology that is used to organize large business databases and support business
intelligence. OLAP is a database technology that has been optimized for querying and reporting, instead of
processing transactions OLAP databases are divided into one or more cubes, and each cube is organized
and designed by a cube administrator to fit the way that you retrieve and analyze data. OLAP is used for
decision-support systems to analyze aggregated information for sales, finance, budget, and many other
types of applications. OLAP is about aggregating measures based on dimensionhierarchies and storing these
precalculated aggregations in a special data structure. With the help of preaggregations and special indexes,
you can query aggregated data and get decision-support query results back in real time
OLAP provides us with a very good view of what is happening, but can not predict what will happen in
the future or why it is happening.This part is done by datamining.Data Mining is a combination of
discovering techniques + prediction techniques.
1. Understand the Adventure works database which will be used in this project(fully understand the
transactional data available in the database).
2. Getting familiar with Sql Server Analysis Services (SSAS) tools and various datamining
algorithms such as Clustering,Association,Time Series etc in the SSAS tool
3. The next step will be to come up with a list of questions: what questions need to be answered,
what metrics will help business managers monitor and grow the business.
4. Based on the business questions that need to be answered, data staging layer in a star schema
format will be designed.
5. Then the cube will be built to extract data from the Star schema staging layer and we perform our
data mining on the cube.
6. Perform Datamining in the Adventure works database to find hidden patterns and information
using DMX and MDX.
Adventure Works Cycles is a large multinational bicycle manufacturer, with headquarters located in
Bothell, Washington. The company has approximately 300 employees, 29 of which are sales
representatives. The primary distribution channel for Adventure Works Cycles through the retail stores of
their resellers. These resellers are located in Australia, Canada, France, Germany, the United Kingdom,
and the United States. Adventure Works Cycles also sells to individual customers worldwide by means of
the Internet.
4. Getting familiar with SQL Server Analysis Services (SSAS) tools and
various data mining algorithms
In this section we will learn and understand some data mining algorithms and its applications that will
be used as a part of project later.
Association models are built on datasets that contain identifiers both for individual cases and for the
items that the cases contain. A group of items in a case is called anitemset. An association model consists
of a series of itemsets and the rules that describe how those items are grouped together within the cases.
The rules that the algorithm identifies can be used to predict a customer's likely future purchases, based
on the items that already exist in the customer's shopping cart.
The Microsoft Clustering algorithm is a segmentation algorithm provided by Analysis Services. The
algorithm uses iterative techniques to group cases in a dataset into clusters that contain similar
characteristics. These groupings are useful for exploring data, identifying anomalies in the data, and
creating predictions.
Clustering models identify relationships in a dataset that you might not logically derive through casual
observation. For example, you can logically discern that people who commute to their jobs by bicycle do
not typically live a long distance from where they work. The algorithm, however, can find other
characteristics about bicycle commuters that are not as obvious. In the following diagram, cluster A
represents data about people who tend to drive to work, while cluster B represents data about people
who tend to ride bicycles to work.
• Decision Tree
In the following section we will understand what a star schema is and the various terms in star schema
such as fact table, dimension table, measures, groups, etc and then design a star schema based on a list of
questions. This understanding will be the first step in our Data mining activity we will be performing as a
part of this project.
The star schema architecture is the simplest data warehouse schema. It is called a star schema because the
diagram resembles a star, with points radiating from a center. The center of the star consists of fact table
and the points of the star are the dimension tables.
A fact table typically has two types of columns: foreign keys to dimension tables and measures those that
contain numeric facts. A fact table can contain fact's data on detail or aggregated level.
A dimension is a structure usually composed of one or more hierarchies that categorizes data. If a
dimension hasn't got a hierarchies and levels it is called flat dimension or list. The primary keys of each
of the dimension tables are part of the composite primary key of the fact table. Dimensional attributes
help to describe the dimensional value. They are normally descriptive, textual values. Dimension tables
are generally small in size then fact table.
Typical fact tables store data about sales while dimension tables data about geographic region (markets,
cities), clients, products, times, channels.
6.1. Cube:
OLAP Cube is the basic unit of storage for Multidimensional data, on which we can do analysis on stored
data and study the various patterns.
To build an SSAS cube, you must first start a project by following these steps:
All Programs -> Microsoft SQL Server -> SQL Server BIDS
You’re now presented with an empty window, which seems like a rare beginning to a project with a
template; really, you have nothing to start with, so it’s time to start creating. The first component you’ll
need is somewhere to retrieve data from: a data source.
To create the data source you’ll use for your first cube, follow these steps:
Meanwhile, go ahead and click Next to continue creating your data source. In this next screen, it’s time to
set up a connection string.
If your AdventureWorksDW database is visible as a selection already, go ahead and choose it; if not,
click New.
For your server name, enter (local), and then drop down the box labeled Select or Enter a Database Name
and chooseAdventureWorksDW.
You can now enter the user you want SSAS to impersonate when it connects to this data source. Select
Use the Service Account and click Next. Using the service account (the account that runs the SQL Server
Analysis Server service) is fairly common even in production, but make sure that service account has
privileges to read your data source.
For your data source name, type AdventureWorksDW and then click Finish.
Right-click Data Source Views and choose New Data Source View. Predictably, up comes the Data
Source View Wizard to walk you through the process. Click Next.
Make sure the AdventureWorksDW data source is selected and then click Next.
On the Select Tables and Views screen, choose FactInternetSales under Available objects and then click
the right arrow to move it into the Included Objects column on the right.
To add its related dimensions, click the Add Related Tables button as shown in Figure 18-3 and then click
Next. Note that one of the related tables is a fact, not a dimension. There’s no distinction made at this
level. Later, you will be able to select and edit dimensions individually.
The wizard will now want you to tell the wizard where to find measure groups. You could help it out by
telling the wizard that those are in your fact tables and then click Next.
At this point, you have measures, but you still need some dimensions; the wizard will select the
dimension tables from your data source view to create as new dimensions. Again, by default they’re all
selected, and you can click Next.
The wizard is now ready to complete. Verify everything is done as per above steps. If everything appears
to be in order, click Finish.
Select Deploy First Cube on the Build menu. You’ll see a series of status messages as the cube is built,
deployed, and processed for the first time. You’ll receive a few warnings when you deploy FirstCube,
and if they’re warnings and not errors, you can safely ignore them for now.
When it’s done and you see Deployment Completed Successfully in the lower right, your first cube is
ready to browse.
Find the product which has been ordered more than 500 in all countries??
Retrieve all the products in descending order of their Internet sales amount of year 2007
What is total sales amount in all countries for the year 2007??
7.2. Screenshots:
What is sales amount in all the countries??
What is total sales amount in all countries for the year 2007??
Find the product which has been ordered more than 500 in all countries??
In the following section we perform the main part of our project i.e the data mining activity as a whole
where we did the bits and pieces so far.
The result of a star schema is nothing more than a table similar to a database table. Based on the business
need we created a table and now we are going to perform our task of finding the hidden activities in the
table in the following section.
Find the tables which contains the data of the activity we are going to perform.
Find the appropriate data mining algorithm which would help us to solve our problem
Steps to be followed:
Step 2 Result:
Number Cars Owned =0 , Yearly Income < 40419 , Commute Distance=0-1 Miles ,
Determine the top two subcategories a customer is likely to purchase that has purchased a road bike and a jersey:
SELECT
Predict([Subcategories],2) as [Subcategories]
FROM
[SubcategoryAssociations]
NATURAL PREDICTION JOIN
(SELECT
(SELECT 'Mountain Bikes' AS Subcategory
UNION SELECT 'Jerseys' AS Subcategory
) AS Subcategories
) AS t
10. References
1. Data-Mining-With-Sql-Server-2008
2. SQL Server 2012 Tutorials - Analysis Services Data Mining
3. Data Mining: Concepts and Techniques by Jiawei Han (Author), Micheline Kamber (Author)
4. Internet resources:
• http://www.codeproject.com/Articles/658912/Create-First-OLAP-Cube-in-SQL-Server-
Analysis-Serv
• http://www.codeproject.com/Articles/710387/Learn-to-Write-Custom-MDX-Query-First-Time
• http://msdn.microsoft.com/en-IN/library/ms175595.aspx
• http://marktab.net/datamining/2010/08/21/mining-olap-cubes
• http://www.erpsoftwareblog.com/2014/04/using-ssas-sql-server-analysis-services-data-
mining-to-automate-marketing-analysis-part-1/
• http://msdn.microsoft.com/en-us/library/cc280440.aspx
5. Software Used:
• Microsoft SQL Server,
• Microsoft Visual Studio 2012 or any higher
• Microsoft SQL Server 2012 Business Intelligence