Data Mining Questions 1st Unit
Data Mining Questions 1st Unit
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful, previously
unknown, and potentially valuable information from large datasets. The KDD process is an iterative
process and it requires multiple iterations of the above steps to extract accurate knowledge from the
data. The following steps are included in KDD process:
Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data from collection.
Data Integration
Data integration is defined as heterogeneous data from multiple sources combined in a common source
(Data Warehouse). Data integration using Data Migration tools, Data Synchronization tools and
ETL(Extract-Load-Transformation) process.
Data Selection
Data selection is defined as the process where data relevant to the analysis is decided and retrieved from
the data collection. For this we can use Neural network, Decision Trees, Naive bayes, Clustering,
and Regression methods.
Data Transformation
Data Transformation is defined as the process of transforming data into appropriate form required by
mining procedure. Data Transformation is a two step process:
1. Data Mapping: Assigning elements from source base to destination to capture transformations.
Data Mining
Data mining is defined as techniques that are applied to extract patterns potentially useful. It transforms
task relevant data into patterns, and decides purpose of model using classification or characterization.
Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on
given measures. It find interestingness score of each pattern, and
uses summarization and Visualization to make data understandable by user.
Database Task Primitive
1. How to construct a data mining query
The primitives allow the user to interactively communicate with the data mining system during
discovery to direct the mining process, or examine the finding