KDD Process Mode Framework

KDD (Knowledge Discovery in Databases) refers to the process of discovering useful patterns and knowledge from large amounts of data. It involves data cleaning, transformation, mining patterns using algorithms, and interpreting and evaluating the results. The goal is to extract hidden and useful information from large datasets to help make informed decisions. The nine step iterative KDD process begins with understanding the problem domain and ends with implementing the discovered knowledge back into the system.

Uploaded by

Sanchit Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views5 pages

KDD Process Mode Framework

Uploaded by

Sanchit Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Topic-: KDD(Knowledge Discovery Databases)

 The term KDD stands for Knowledge Discovery in Databases. It refers to the
broad procedure of discovering knowledge in data and emphasizes the high-
level applications of specific Data Mining techniques.
 It is a field of interest to researchers in various fields, including artificial
intelligence, machine learning, pattern recognition, databases, statistics,
knowledge acquisition for expert systems, and data visualization.
 The main objective of the KDD process is to extract information from data in the
context of large databases.
 It does this by using Data Mining algorithms to identify what is deemed
knowledge.
 The Knowledge Discovery in Databases is considered as a programmed,
exploratory analysis and modeling of vast data repositories.
 KDD is the organized procedure of recognizing valid, useful, and understandable
patterns from huge and complex data sets.
 Data Mining is the root of the KDD procedure, including the inferring of
algorithms that investigate the data, develop the model, and find previously
unknown patterns.
 The model is used for extracting the knowledge from the data, analyze the data,
and predict the data.
 The availability and abundance of data today make knowledge discovery and
Data Mining a matter of impressive significance and need.
 In the recent development of the field, it isn't surprising that a wide variety of
techniques is presently accessible to specialists and experts.

The KDD Process

 The knowledge discovery process(illustrates in the given figure) is iterative and
interactive, comprises of nine steps.
 The process is iterative at each stage, implying that moving back to the previous
actions might be required.
 The process has many imaginative aspects in the sense that one cant presents
one formula or make a complete scientific categorization for the correct decisions
for each step and application type.
 Thus, it is needed to understand the process and the different requirements and
possibilities in each stage.
 The process begins with determining the KDD objectives and ends with the
implementation of the discovered knowledge. At that point, the loop is closed,
and the Active Data Mining starts.
 Subsequently, changes would need to be made in the application domain. For
example, offering various features to cell phone users in order to reduce churn.
 This closes the loop, and the impacts are then measured on the new data
repositories, and the KDD process again.
 Following is a concise description of the nine-step KDD process, Beginning with a
managerial step:

1. Building up an understanding of the application domain

 This is the initial preliminary step. It develops the scene for understanding what
should be done with the various decisions like transformation, algorithms,
representation, etc.
 The individuals who are in charge of a KDD venture need to understand and
characterize the objectives of the end-user and the environment in which the
knowledge discovery process will occur ( involves relevant prior knowledge).

2. Choosing and creating a data set on which discovery will be performed

 Once defined the objectives, the data that will be utilized for the knowledge
discovery process should be determined.
 This incorporates discovering what data is accessible, obtaining important data,
and afterward integrating all the data for knowledge discovery onto one set
involves the qualities that will be considered for the process.
 This process is important because of Data Mining learns and discovers from the
accessible data.
 This is the evidence base for building the models. If some significant attributes
are missing, at that point, then the entire study may be unsuccessful from this
respect, the more attributes are considered.
 On the other hand, to organize, collect, and operate advanced data repositories is
expensive, and there is an arrangement with the opportunity for best
understanding the phenomena.
 This arrangement refers to an aspect where the interactive and iterative aspect of
the KDD is taking place. This begins with the best available data sets and later
expands and observes the impact in terms of knowledge discovery and modeling.

3. Preprocessing and cleaning

 In this step, data reliability is improved. It incorporates data clearing, for example,
Handling the missing quantities and removal of noise or outliers. It might include
complex statistical techniques or use a Data Mining algorithm in this context.
 For example, when one suspects that a specific attribute of lacking reliability or
has many missing data, at this point, this attribute could turn into the objective of
the Data Mining supervised algorithm. A prediction model for these attributes
will be created, and after that, missing data can be predicted.
 The expansion to which one pays attention to this level relies upon numerous
factors. Regardless, studying the aspects is significant and regularly revealing by
itself, to enterprise data frameworks.

4. Data Transformation

 In this stage, the creation of appropriate data for Data Mining is prepared and
developed. Techniques here incorporate dimension reduction( for example,
feature selection and extraction and record sampling), also attribute
transformation(for example, discretization of numerical attributes and functional
transformation).
 This step can be essential for the success of the entire KDD project, and it is
typically very project-specific. For example, in medical assessments, the quotient
of attributes may often be the most significant factor and not each one by itself.
In business, we may need to think about impacts beyond our control as well as
efforts and transient issues.
 For example, studying the impact of advertising accumulation. However, if we do
not utilize the right transformation at the starting, then we may acquire an
amazing effect that insights to us about the transformation required in the next
iteration. Thus, the KDD process follows upon itself and prompts an
understanding of the transformation required.

5. Prediction and description

 We are now prepared to decide on which kind of Data Mining to use, for
example, classification, regression, clustering, etc. This mainly relies on the KDD
objectives, and also on the previous steps.
 There are two significant objectives in Data Mining, the first one is a prediction,
and the second one is the description. Prediction is usually referred to as
supervised Data Mining, while descriptive Data Mining incorporates the
unsupervised and visualization aspects of Data Mining.
 Most Data Mining techniques depend on inductive learning, where a model is
built explicitly or implicitly by generalizing from an adequate number of
preparing models. The fundamental assumption of the inductive approach is that
the prepared model applies to future cases. The technique also takes into
account the level of meta-learning for the specific set of accessible data.

6. Selecting the Data Mining algorithm

 Having the technique, we now decide on the strategies. This stage incorporates
choosing a particular technique to be used for searching patterns that include
multiple inducers.
 For example, considering precision versus understandability, the previous is
better with neural networks, while the latter is better with decision trees. For each
system of meta-learning, there are several possibilities of how it can be
succeeded.
 Meta-learning focuses on clarifying what causes a Data Mining algorithm to be
fruitful or not in a specific issue. Thus, this methodology attempts to understand
the situation under which a Data Mining algorithm is most suitable.
 Each algorithm has parameters and strategies of leaning, such as ten folds cross-
validation or another division for training and testing.

7. Utilizing the Data Mining algorithm

 At last, the implementation of the Data Mining algorithm is reached. In this stage,
we may need to utilize the algorithm several times until a satisfying outcome is
obtained. For example, by turning the algorithms control parameters, such as the
minimum number of instances in a single leaf of a decision tree.

8. Evaluation

 In this step, we assess and interpret the mined patterns, rules, and reliability to
the objective characterized in the first step. Here we consider the preprocessing
steps as for their impact on the Data Mining algorithm results. For example,
including a feature in step 4, and repeat from there.
 This step focuses on the comprehensibility and utility of the induced model. In
this step, the identified knowledge is also recorded for further use. The last step is
the use, and overall feedback and discovery results acquire by Data Mining.

9. Using the discovered knowledge

 Now, we are prepared to include the knowledge into another system for further
activity. The knowledge becomes effective in the sense that we may make
changes to the system and measure the impacts.
 The accomplishment of this step decides the effectiveness of the whole KDD
process. There are numerous challenges in this step, such as losing the
"laboratory conditions" under which we have worked. For example, the
knowledge was discovered from a certain static depiction, it is usually a set of
data, but now the data becomes dynamic.
 Data structures may change certain quantities that become unavailable, and the
data domain might be modified, such as an attribute that may have a value that
was not expected previously.

DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Illustrator Mcqs
50% (2)
Illustrator Mcqs
5 pages
Bingo Da Porcentagem
No ratings yet
Bingo Da Porcentagem
96 pages
Chapter 3 DATA MINIG
No ratings yet
Chapter 3 DATA MINIG
17 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
Jaka Pramana KUMPULAN SERIAL NUMBER DAN PRODUCT KEY
No ratings yet
Jaka Pramana KUMPULAN SERIAL NUMBER DAN PRODUCT KEY
46 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
KDD-Knowledge Discovery in Databases
No ratings yet
KDD-Knowledge Discovery in Databases
5 pages
AmpliTube 3 User Manual
No ratings yet
AmpliTube 3 User Manual
300 pages
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
No ratings yet
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
60 pages
Resume
100% (2)
Resume
7 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
22 pages
Chapter-3 DATA MINING PDF
No ratings yet
Chapter-3 DATA MINING PDF
13 pages
EHEv1 Module 04 Password Cracking Techniques and Countermeasures
No ratings yet
EHEv1 Module 04 Password Cracking Techniques and Countermeasures
25 pages
1 Introduction
No ratings yet
1 Introduction
130 pages
Data Mining Simran
No ratings yet
Data Mining Simran
128 pages
AM 8000 Manu Prog ENG
No ratings yet
AM 8000 Manu Prog ENG
60 pages
ch1-2 Updated
No ratings yet
ch1-2 Updated
136 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
50 pages
Unit 1
No ratings yet
Unit 1
43 pages
DMW ALLinONE
No ratings yet
DMW ALLinONE
64 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
Data Mining
No ratings yet
Data Mining
43 pages
Comprehensive AI & ML Course - From Beginner To Gen...
No ratings yet
Comprehensive AI & ML Course - From Beginner To Gen...
5 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
29 pages
A Map of The Networking Code
No ratings yet
A Map of The Networking Code
41 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
Shamballa Multidimensional Healing Nível 3 (6lkvo318y204)
No ratings yet
Shamballa Multidimensional Healing Nível 3 (6lkvo318y204)
1 page
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Data Mining
No ratings yet
Data Mining
25 pages
PPT-DWDM Unit 3
No ratings yet
PPT-DWDM Unit 3
106 pages
AIML-HC Mod 02
No ratings yet
AIML-HC Mod 02
65 pages
MNL+QG, ENG+RUS, TR-2516+1U#1+BRAND-EU, V1 0+$20211217+ (v1 0 0)
No ratings yet
MNL+QG, ENG+RUS, TR-2516+1U#1+BRAND-EU, V1 0+$20211217+ (v1 0 0)
32 pages
SCADA Software Requirements
No ratings yet
SCADA Software Requirements
14 pages
DWM 4
No ratings yet
DWM 4
23 pages
Unit 1 DM
No ratings yet
Unit 1 DM
16 pages
Dmbi Unit-3
No ratings yet
Dmbi Unit-3
21 pages
DM C1 Overview
No ratings yet
DM C1 Overview
55 pages
UNIT - 1 Data Mining
No ratings yet
UNIT - 1 Data Mining
16 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
A Guide To Installing CMG 2015 Software
No ratings yet
A Guide To Installing CMG 2015 Software
19 pages
CODE1
No ratings yet
CODE1
20 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
Chapter 1 - Introduction To Knowledge Discovery in
No ratings yet
Chapter 1 - Introduction To Knowledge Discovery in
18 pages
Sad Final Report
No ratings yet
Sad Final Report
75 pages
Recdatatjv Manual Eng
No ratings yet
Recdatatjv Manual Eng
21 pages
Brochure X100G - Eng MCAUS0305EA - Low 1807
No ratings yet
Brochure X100G - Eng MCAUS0305EA - Low 1807
8 pages
Assignment Solution
No ratings yet
Assignment Solution
27 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
11 pages
Knoledge Discovery in Databases
No ratings yet
Knoledge Discovery in Databases
6 pages
1.1P - Preparing For OOP Answer Sheet Ver Final
No ratings yet
1.1P - Preparing For OOP Answer Sheet Ver Final
9 pages
Data Mining and Knowledge Discovery By, Amit Vaghela (020102017)
No ratings yet
Data Mining and Knowledge Discovery By, Amit Vaghela (020102017)
16 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Agile Estimation: Agile42 - We Advise, Train and Coach Companies Building Software
No ratings yet
Agile Estimation: Agile42 - We Advise, Train and Coach Companies Building Software
11 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
5 pages
KDD Process in Data Mining - Javatpoint
No ratings yet
KDD Process in Data Mining - Javatpoint
10 pages
Paper Ljupce Markusheski PHD
No ratings yet
Paper Ljupce Markusheski PHD
12 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Intelligent Knowledge Discovery
No ratings yet
Intelligent Knowledge Discovery
4 pages
Knowledge Discovery in Database
No ratings yet
Knowledge Discovery in Database
10 pages
FDS Unit 1
No ratings yet
FDS Unit 1
20 pages
Ip Mysql
No ratings yet
Ip Mysql
10 pages
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
No ratings yet
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
5 pages
Knowledge Discovery Database (KDD Process)
No ratings yet
Knowledge Discovery Database (KDD Process)
5 pages
SourceCode Proxy HTTP
No ratings yet
SourceCode Proxy HTTP
5 pages
Power School Power Teacher Handbook
No ratings yet
Power School Power Teacher Handbook
13 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
What Is The KDD Process
No ratings yet
What Is The KDD Process
2 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
HP Z1 G9 Tower Desktop: The Most Affordable Pro-Certi Ed Z Desktop
No ratings yet
HP Z1 G9 Tower Desktop: The Most Affordable Pro-Certi Ed Z Desktop
4 pages
KDD
No ratings yet
KDD
3 pages
cc15 2nd
No ratings yet
cc15 2nd
2 pages
5 Page PDF
No ratings yet
5 Page PDF
5 pages
Data Mining 14
No ratings yet
Data Mining 14
3 pages
Accounting Information System For Decision Making
No ratings yet
Accounting Information System For Decision Making
3 pages
Clustering Algorithm For Spatial Data Mining: An: A.Padmapriya, N.Subitha
No ratings yet
Clustering Algorithm For Spatial Data Mining: An: A.Padmapriya, N.Subitha
6 pages
Tuan Nguyen: Senior IT Director
No ratings yet
Tuan Nguyen: Senior IT Director
3 pages
DM Week 2 Des
No ratings yet
DM Week 2 Des
3 pages
Registration - Mediology Software Pvt. LTD - B.Tech CS - IT 2025 & 2026 Batch - GU - GCET
No ratings yet
Registration - Mediology Software Pvt. LTD - B.Tech CS - IT 2025 & 2026 Batch - GU - GCET
2 pages
AWS CloudTrail CheatSheet
No ratings yet
AWS CloudTrail CheatSheet
1 page
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

KDD Process Mode Framework

Uploaded by

KDD Process Mode Framework

Uploaded by

Topic-: KDD(Knowledge Discovery Databases)

The KDD Process

1. Building up an understanding of the application domain

2. Choosing and creating a data set on which discovery will be performed

3. Preprocessing and cleaning

5. Prediction and description

6. Selecting the Data Mining algorithm

7. Utilizing the Data Mining algorithm

9. Using the discovered knowledge

You might also like