0% found this document useful (0 votes)
9 views6 pages

Data Mining Questions 1st Unit

The KDD (Knowledge Discovery in Databases) process involves extracting valuable information from large datasets through iterative steps including data cleaning, integration, selection, transformation, mining, and pattern evaluation. Each step is crucial for ensuring the accuracy and relevance of the knowledge extracted. Additionally, the document outlines database task primitives that guide users in constructing data mining queries and describes the typical architecture of a database management system.

Uploaded by

Aryan Sukhdewe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Data Mining Questions 1st Unit

The KDD (Knowledge Discovery in Databases) process involves extracting valuable information from large datasets through iterative steps including data cleaning, integration, selection, transformation, mining, and pattern evaluation. Each step is crucial for ensuring the accuracy and relevance of the knowledge extracted. Additionally, the document outlines database task primitives that guide users in constructing data mining queries and describes the typical architecture of a database management system.

Uploaded by

Aryan Sukhdewe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

KDD Process (Knowledge discovery in database)

KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful, previously
unknown, and potentially valuable information from large datasets. The KDD process is an iterative
process and it requires multiple iterations of the above steps to extract accurate knowledge from the
data. The following steps are included in KDD process:

Data Cleaning

Data cleaning is defined as removal of noisy and irrelevant data from collection.

1. Cleaning in case of Missing values.

2. Cleaning noisy data, where noise is a random or variance error.

3. Cleaning with Data discrepancy detection and Data transformation tools.

Data Integration

Data integration is defined as heterogeneous data from multiple sources combined in a common source
(Data Warehouse). Data integration using Data Migration tools, Data Synchronization tools and
ETL(Extract-Load-Transformation) process.

Data Selection

Data selection is defined as the process where data relevant to the analysis is decided and retrieved from
the data collection. For this we can use Neural network, Decision Trees, Naive bayes, Clustering,
and Regression methods.

Data Transformation

Data Transformation is defined as the process of transforming data into appropriate form required by
mining procedure. Data Transformation is a two step process:

1. Data Mapping: Assigning elements from source base to destination to capture transformations.

2. Code generation: Creation of the actual transformation program.

Data Mining

Data mining is defined as techniques that are applied to extract patterns potentially useful. It transforms
task relevant data into patterns, and decides purpose of model using classification or characterization.

Pattern Evaluation

Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on
given measures. It find interestingness score of each pattern, and
uses summarization and Visualization to make data understandable by user.
Database Task Primitive
1. How to construct a data mining query

The primitives allow the user to interactively communicate with the data mining system during
discovery to direct the mining process, or examine the finding

2. The primitives specify:


(1) The set of task-relevant data – which portion of the database to be used
– Database or data warehouse name
– Database tables or data warehouse cubes
– Condition for data selection
– Relevant attributes or dimensions
– Data grouping criteria
(2) The kind of knowledge to be mined – what DB functions to be performed
– Characterization
– Discrimination
– Association
– Classification/prediction
– Clustering
– Outlier analysis
– Other data mining task
(3) The background knowledge to be used – what domain knowledge, concept hierarchies,
etc.

(4) Interestingness measures and thresholds – support, confidence, etc.


(5) Visualization methods – what form to display the result, e.g. rules, tables, charts, graphs,

Typical Database management system architecture


➢ Database, data warehouse, WWW or other information repository (store data)
➢ Database or data warehouse server (fetch and combine data)
➢ Knowledge base (turn data into meaningful groups according to domain knowledge)
➢ Data mining engine (perform mining tasks)
➢ Pattern evaluation module (find interesting patterns)
➢ User interface (interact with the user)

You might also like