Presentation Data Mining
Presentation Data Mining
Data mining
The process of extracting previously unknown data from the large databases
and using it to make organisational decisions
Data mining is the process of looking at large banks of information to generate
new information. Intuitively, you might think that data “mining” refers to the
extraction of new data, but this isn’t the case; instead, data mining is about
extrapolating patterns and new knowledge from the data you’ve already
collected
Why data mining ?
Data mining is one of the most widely used methods to extract information
from large datasets.
There are various techniques of data mining.
The biggest challenge is to analyse the data to extract meaningful information
that can be used to solve a problem or for the growth of the business.
There are powerful tools and techniques available to mine data and find
insights from it.
KDD PROCESS
Data mining
process
Business understanding
First, it is required to understand business objectives clearly and find
out what are the business’s needs.
Next, we have to assess the current situation by finding about the
resources, assumptions, constraints and other important factors
which should be considered.
Then, from the business objectives and current situations, we need to
create data mining goals to achieve the business objectives within the
current situation
Data understanding
First, the data understanding phase starts with initial data collection,
which we collect from available data sources, to help us get familiar
with the data.
Then, the data needs to be explored by tackling the data mining
questions, which can be addressed using querying, reporting and
visualization.
Data preparation
The data preparation typically consumes about 90% of the time of the
project. The outcome of the data preparation phase is the final data
set.
Once available data sources are identified, they need to be selected,
cleaned, constructed and formatted into the desired form.
Modeling
First, modeling techniques have to be selected to be used for the
prepared dataset.
Next, the test scenario must be generated to validate the quality and
validity of the model.
Then, one or more models are created by running the modeling tool
on the prepared dataset.
Finally, models need to be assessed carefully involving stakeholders
to make sure that created models are met business initiatives.
Evaluation
The model results must be evaluated in the context of business
objectives in the first phase.
In this phase, new business requirements may be raised due to the
new patterns that has been discovered in the model results or from
other factors.
go or no-go decision must be made in this step to move to the
deployment phase
Deployment
The knowledge or information, which we gain through data mining process,
needs to be presented in such a way that stakeholders can use it when they
want it.
In the deployment phase, the plans for deployment, maintenance and
monitoring have to be created for implementation and also future supports.
Data Mining Architecture
Data mining techniques
Classification Analysis Technique
We use these data mining techniques, to retrieve important and
relevant information about data and metadata.
As this process is similar to clustering. It relates a way that segments
data records into different segments called classes.
But unlike clustering, here the data analysts would have knowledge of
different cluster. Thus, in classification analysis, we need to apply
algorithms. That is we require deciding how new data should be
classified.
Association
We use Data Mining Techniques, to identify interesting relations
between different variables in the database. Also, the Data Mining
techniques used to unpack hidden patterns in the data.
Association rules are so useful for examining and forecasting
behavior. This is recommended in the retail industry.
Prediction
As we use prediction, data mining technique for some particular uses. As it is
used to discovers the relationship between independent and dependent
variables.
For instance, we use prediction for the sale to predict profit for the future. Let,
suppose the sale is an independent variable, profit could be a dependent
variable. Thus, we can draw a fitted regression curve that is used for profit
prediction.
Clustering
Cluster means a group of data
objects. Also, these objects
are similar to the same cluster.
As a result, objects are similar
to one another within the same
group. Although, they are
different in same or another
clusters.
Also if they belong to the same
group and lowest otherwise.
Sequential Patterns
This is important part of data mining techniques. As this technique seeks to
discover similar patterns.
In sales, with historical transaction data, businesses can identify a set of items.
Thus, customers buy together different times in a year.
Then businesses can use this information to recommend customers. That they
buy it with better deals based on their purchasing frequency in the past.
Decision Trees Technique
A decision tree is a very
important terminology of Data Mining
Just because
this model is very easy to
understand for the users.
Also,each question that is leading to set
of questions. As they help us in
determining the data. So, at last, we can
make the final decision on it.
Advantages of Data Mining
Marketing / Retail
Finance / Banking
Manufacturing
Governments
Disadvantages of data mining
Privacy Issues
Security issues
Misuse of information/inaccurate information
Conclusion