Data Mining Using SQL Server Analysis Server
Data Mining Using SQL Server Analysis Server
In today’s world, software applications are moving from traditional information systems to Business
Intelligent systems. The growing of data and information have within data need to raise to develop
new kind of applications for the organizations. To address this, data mining solutions have become
an integral part of many software solutions.
Data mining is the process of discovering actionable information from large sets of data. It helps
organizations analyze incredible amounts of data in order to detect common patterns or learn new
things. It uses mathematical analysis to derive patterns and trends from existing data. However
existing data need to be organized via ETL (Extract, Transform, Loading) process before applying
the mining technique on them. This is because typically these patterns cannot be discovered by
traditional data exploration methods since the relationships are too complex or because there is too
much data.
There are many data mining techniques available to analyze data and drive the useful knowledge
and patterns from those. These techniques are ranged from extremely complex to basic. Each
technique serves a slightly different purpose or goal. Here are few examples approaches to data
mining.
Clustering
Cluster detection is a type of pattern recognition that is used to detect patterns within large data
sets. It’s a bit like arranging a large amount of information into categories using patterns which
emerge during data analysis.
Classification
Classification Analysis is a systematic process for obtaining important and relevant information
about data, and metadata – data about data. The classification analysis helps identifying to which
set of categories different types of data belong. Classification analysis is closely linked to cluster
analysis as the classification can be used to cluster data.
Regression
Regression is a technique that aims to predict future outcomes using large sets of existing
variables. This is used to predict future user engagement, customer retention and even property
prices.
1
Association Rule Learning
Association rule learning enables the discovery of interesting relations between different variables in
large databases. Association rule learning uncovers hidden patterns in the data that can be used to
identify variables within the data and the co-occurrences of different variables that appear with the
greatest frequencies.
The Microsoft SQL Server Data tools includes SQL server relational databases, Azure SQL
databases, Integration Services packages, Analysis Services data models and reporting Services.
The analysis server contains data mining algorithms and query tools that make it easy to build a
comprehensive solution for a variety of projects. SQL Server Management Studio, contains tools for
browsing models and managing data mining objects.
2
Defining the Problem
First step is to determine the scope of the problem and analyzing the business requirements to
defining specific goals for the data mining project. Here we might need to conduct a data availability
study.
Preparing Data
In this step, we are working with a very huge data set and cannot examine every transaction for
data quality; therefore, we might need data profiling and automated data cleansing and filtering
tools, such as Microsoft SQL Server Master Data Services or SQL Server Data Quality Services to
explore the data and find the inconsistencies.
Exploring Data
In this step, we can use tools such as Master Data Services to canvass available sources of data
and define their availability for data mining. We can use tools such as SQL Server Data Quality
Services, or the Data Profiler in Integration Services, to analyze the distribution of data and repair
issues such as incorrect or missing data.
Building Models
In this step, we state the columns of data which we want to use by creating a mining structure.
When we process the mining structure, Analysis Services produces aggregates and other statistical
information that can be used for analysis. This information can be used by any mining model that is
based on the structure.
Processing a model is called as training. Applying a specific mathematical algorithm to the data in
the structure is the process of training. By using training, we can extract patterns. The patterns that
we find in the training process depend on the following three points
Selection of training data,
The algorithm we chose,
How we have configured the algorithm.
We can also use parameters to adjust each algorithm and apply filters to the training data to use
just a subset of the data. After data is passed through the model, the mining model object holds
summaries and patterns that can be queried or used for prediction.
We can explore the trends and patterns that the algorithms discover by using the viewers in Data
Mining Designer in SQL Server Data Tools. We can also test how well the models create predictions
by using tools in the designer such as the lift chart and classification matrix. To verify whether the
model is specific to our data or might be used to make inferences on the general population, we can
use the statistical technique called cross-validation to automatically create subsets of the data and
test the model against each subset.
3
Deploying and Updating Models
Here we can use the use the models to create predictions, which we can then use to make
business decisions. SQL Server provides the DMX language that we can use to create prediction
queries, and Prediction Query Builder to help you build the queries.
References
http://charc-concepts.org/the-benefits-of-data-mining/
https://msdn.microsoft.com/en-us/library/bb522607.aspx
https://msdn.microsoft.com/en-us/library/bb510516.aspx