0% found this document useful (0 votes)
2 views

Data Mining.intro

The document provides an overview of data mining, highlighting its importance in discovering hidden patterns in large datasets and its applications in various fields such as market analysis, fraud detection, and healthcare. It outlines the data mining process using the CRISP-DM model, which includes phases like business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The document emphasizes the significance of data preparation and transformation in ensuring the success of data mining projects.

Uploaded by

urwakhan413
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Mining.intro

The document provides an overview of data mining, highlighting its importance in discovering hidden patterns in large datasets and its applications in various fields such as market analysis, fraud detection, and healthcare. It outlines the data mining process using the CRISP-DM model, which includes phases like business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The document emphasizes the significance of data preparation and transformation in ensuring the success of data mining projects.

Uploaded by

urwakhan413
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA WAREHOUSING

AND DATA MINING


Intro to DM
• Data mining is looking for hidden, valid, and potentially useful
patterns in huge data sets. Data Mining is all about discovering
unsuspected/ previously unknown relationships amongst the data.
Intro Continue…
• Data mining is one of the most useful techniques that help
entrepreneurs, researchers, and individuals to extract valuable
information from huge sets of data. Data mining is also
called Knowledge Discovery in Database (KDD). The knowledge
discovery process includes Data cleaning, Data integration, Data
selection, Data transformation, Data mining, Pattern evaluation, and
Knowledge presentation.
Why Data Mining?
• As data mining is having spacious applications. Thus, it is the
young and promising field for the present generation. It has
attracted a great deal of attention in the information industry
and in society. Due to the wide availability of huge amounts of
data and the imminent need for turning such data into useful
information and knowledge. Thus, we use information and
knowledge for applications ranging from market analysis. This
is the reason why data mining, known as knowledge discovery
from data.
Types of Data

• Relational databases
• Data warehouses
• Advanced DB and information repositories
• Object-oriented and object-relational databases
• Transactional and Spatial databases
• Heterogeneous and legacy databases
• Multimedia and streaming database
• Text databases
• Text mining and Web mining
Data Mining Applications
1. Market Analysis and Management
• Customer Profiling
• Identifying Customer Requirements
• Cross Market Analysis
• Target Marketing
• Determining Customer purchasing pattern
• Providing Summary Information .
2. Fraud Detection
3. Corporate Analysis and Risk Management
• Finance Planning and Asset Evaluation
• Resource Planning
• Competition
5. Data Mining in Healthcare
4. Data mining in Education
Data Mining process/Models

Cross-Industry Standard Process for Data Mining (CRISP-DM)

SEMMA (Sample, Explore, Modify, Model, Assess)


Data Mining Implementation
Process
CRISP-DM is a reliable data mining model consisting of six
phases. It is a cyclical process that provides a structured
approach to the data mining process. The six phases can
be implemented in any order but it would sometimes
require backtracking to the previous steps and repetition
of actions.
Business understanding:

• In this phase, business and data-mining goals are established.


• First, you need to understand business and client objectives. You need to
define what your client wants (which many times even they do not know
themselves)
• Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your assessment.
• Using business objectives and current scenario, define your data mining
goals.
• A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.
Data understanding:
• First, data is collected from multiple data sources available in the organization.
• These data sources may include multiple databases, flat filer or data cubes.
There are issues like object matching and schema integration which can arise
during Data Integration process. It is a quite complex and tricky process as data
from various sources unlikely to match easily. For example, table A contains an
entity named cust_no whereas another table B contains an entity named cust-id.
• Therefore, it is quite difficult to ensure that both of these given objects refer to
the same value or not. Here, Metadata should be used to reduce errors in the
data integration process.
• Next, the step is to search for properties of acquired data. A good way to explore
the data is to answer the data mining questions (decided in business phase)
using the query, reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained. Missing
data if any should be acquired.
Data preparation

• The data preparation process consumes about 50-70% of a project's time


and effort
• The data from different sources should be selected, cleaned, transformed,
formatted, anonymized, and constructed (if required).
• Data cleaning is a process to "clean" the data by smoothing noisy data and filling
in missing values.
• For example, for a customer demographics profile, age data is missing. The data is
incomplete and should be filled. In some cases, there could be data outliers. For
instance, age has a value 300. Data could be inconsistent. For instance, name of
the customer is different in different tables.
• Data transformation operations change the data to make it useful in data mining.
Following transformation can be applied
Data transformation

Data transformation operations would contribute toward the success of


the mining process.
• Smoothing:
• Aggregation:
• Generalization:
• Normalization:
• Attribute construction
The result of this process is a final data set that can be used in
modeling.
Modelling

• In this phase, mathematical models are used to determine data


patterns.
• Based on the business objectives, suitable modeling techniques
should be selected for the prepared dataset.
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
• Results should be assessed by all stakeholders to make sure that
model can meet data mining objectives.
Evaluation:

• In this phase, patterns identified are evaluated against the business


objectives.
• Results generated by the data mining model should be evaluated
against the business objectives.
• Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because of
data mining.
• A go or no-go decision is taken to move the model in the deployment
phase.
Deployment:

• In the deployment phase, you ship your data mining discoveries to


everyday business operations.
• The knowledge or information discovered during data mining process
should be made easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and
monitoring of data mining discoveries is created.
• A final project report is created with lessons learned and key
experiences during the project. This helps to improve the
organization's business policy.

You might also like