0% found this document useful (0 votes)

100 views5 pages

KDD-Knowledge Discovery in Databases

Knowledge Discovery in Databases (KDD) is a process for extracting useful knowledge from data, involving steps like data cleaning, integration, selection, transformation, mining, evaluation, and representation. KDD offers advantages such as improved decision-making and efficiency but also poses challenges like privacy concerns and high costs. It is an iterative process that requires careful handling of data quality and complexity to avoid unintended consequences.

Uploaded by

Jayesh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views5 pages

KDD-Knowledge Discovery in Databases

Uploaded by

Jayesh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Knowledge Discovery in Databases

Knowledge discovery in databases (KDD) is the process of discovering useful

knowledge from a collection of data. This widely used data mining technique is a
process that includes data preparation and selection, data cleansing, incorporating
prior knowledge on data sets and interpreting accurate solutions from the observed
results.

KDD includes multidisciplinary activities. This encompasses data storage and

access, scaling algorithms to massive data sets and interpreting results. The data
cleansing and data access process included in data warehousing facilitate the KDD
process. Artificial intelligence also supports KDD by discovering empirical laws from
experimentation and observations. The patterns recognized in the data must be valid
on new data, and possess some degree of certainty. These patterns are considered
new knowledge.

Steps involved in the entire KDD process are:

1. Data Cleaning

Data cleaning is defined as removal of noisy and irrelevant data from

collection.

1. Cleaning in case of Missing values.

2. Cleaning noisy data, where noise is a random or variance error.

3. Cleaning with Data discrepancy detection and Data

transformation tools.
2. Data Integration

Data integration is defined as heterogeneous data from multiple sources

combined in a common source(DataWarehouse). Data integration using Data
Migration tools, Data Synchronization tools and
ETL(Extract-Load-Transformation) process.

3. Data Selection

Data selection is defined as the process where data relevant to the analysis is
decided and retrieved from the data collection. For this we can use Neural
network, Decision Trees, Naive bayes, Clustering, and Regression methods.

4. Data Transformation

Data Transformation is defined as the process of transforming data into

appropriate form required by mining procedure. Data Transformation is a two
step process:

1. Data Mapping: Assigning elements from source base to destination

to capture transformations.

2. Code generation: Creation of the actual transformation program.

5. Data Mining

Data mining is defined as techniques that are applied to extract patterns

potentially useful. It transforms task relevant data into patterns, and decides
purpose of model using classification or characterization.

6. Pattern Evaluation

Pattern Evaluation is defined as identifying strictly increasing patterns

representing knowledge based on given measures. It find interestingness
score of each pattern, and uses summarization and Visualization to make
data understandable by user.

7. Knowledge Representation

This involves presenting the results in a way that is meaningful and can be
used to make decisions.

Note: KDD is an iterative process where evaluation measures can be

enhanced, mining can be refined, new data can be integrated and
transformed in order to get different and more appropriate
results.Preprocessing of databases consists of Data cleaning and Data
Integration.

Advantages of KDD
1. Improves decision-making: KDD provides valuable insights and

knowledge that can help organizations make better decisions.

2. Increased efficiency: KDD automates repetitive and

time-consuming tasks and makes the data ready for analysis, which

saves time and money.

3. Better customer service: KDD helps organizations gain a better

understanding of their customers’ needs and preferences, which can

help them provide better customer service.

4. Fraud detection: KDD can be used to detect fraudulent activities by

identifying patterns and anomalies in the data that may indicate

fraud.

5. Predictive modeling: KDD can be used to build predictive models

that can forecast future trends and patterns.

Disadvantages of KDD
1. Privacy concerns: KDD can raise privacy concerns as it involves

collecting and analyzing large amounts of data, which can include

sensitive information about individuals.

2. Complexity: KDD can be a complex process that requires

specialized skills and knowledge to implement and interpret the

results.
3. Unintended consequences: KDD can lead to unintended

consequences, such as bias or discrimination, if the data or models

are not properly understood or used.

4. Data Quality: KDD process heavily depends on the quality of data,

if data is not accurate or consistent, the results can be misleading

5. High cost: KDD can be an expensive process, requiring significant

investments in hardware, software, and personnel.

DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
"Power Generation Using Speed Breaker": This Project Report Submitted To
No ratings yet
"Power Generation Using Speed Breaker": This Project Report Submitted To
31 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
52 pages
Electric & Hybrid Vehicle Technology International - July 2012
100% (5)
Electric & Hybrid Vehicle Technology International - July 2012
225 pages
Robi Bill
No ratings yet
Robi Bill
1 page
Chapter - 5 - Data Mining
No ratings yet
Chapter - 5 - Data Mining
18 pages
Introduction To Data Mining-1
100% (1)
Introduction To Data Mining-1
24 pages
Smart Plant Installation
No ratings yet
Smart Plant Installation
52 pages
Knowledge Based Expert Systems
No ratings yet
Knowledge Based Expert Systems
4 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
22 pages
Steam Turbine Control Solutions: Features
No ratings yet
Steam Turbine Control Solutions: Features
4 pages
3161 Governor: For Control of Engines and Steam Turbines
No ratings yet
3161 Governor: For Control of Engines and Steam Turbines
4 pages
Unit I Data Mining
No ratings yet
Unit I Data Mining
34 pages
Data Mining Lecture One - Docx1
No ratings yet
Data Mining Lecture One - Docx1
12 pages
Data Mining Simran
No ratings yet
Data Mining Simran
128 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Data Mining 14
No ratings yet
Data Mining 14
3 pages
UNIT - 1 Data Mining
No ratings yet
UNIT - 1 Data Mining
16 pages
DMW ALLinONE
No ratings yet
DMW ALLinONE
64 pages
Shodan Cheet Sheet
No ratings yet
Shodan Cheet Sheet
1 page
BDUD Unit1
No ratings yet
BDUD Unit1
100 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
Data Mining New
No ratings yet
Data Mining New
21 pages
DM Module 1
No ratings yet
DM Module 1
11 pages
Knoledge Discovery in Databases
No ratings yet
Knoledge Discovery in Databases
6 pages
DWDM 1
No ratings yet
DWDM 1
17 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
Unit 1 DM
No ratings yet
Unit 1 DM
16 pages
New Note
No ratings yet
New Note
23 pages
KDD
No ratings yet
KDD
3 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
p196 - Knowledge Discovery in Databases
No ratings yet
p196 - Knowledge Discovery in Databases
8 pages
Dmbi Unit-3
No ratings yet
Dmbi Unit-3
21 pages
Unit 1
No ratings yet
Unit 1
43 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
Types of Attributes-1
No ratings yet
Types of Attributes-1
8 pages
Knowledge Discovery in Databases (KDD) : An Overview
No ratings yet
Knowledge Discovery in Databases (KDD) : An Overview
4 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
KDD Process Mode Framework
No ratings yet
KDD Process Mode Framework
5 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
Knowledge Discovery Database (KDD Process)
No ratings yet
Knowledge Discovery Database (KDD Process)
5 pages
PPT-DWDM Unit 3
No ratings yet
PPT-DWDM Unit 3
106 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
KDD Process in Data Mining - Javatpoint
No ratings yet
KDD Process in Data Mining - Javatpoint
10 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
Data Mining
No ratings yet
Data Mining
15 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
DWM 4
No ratings yet
DWM 4
23 pages
cc15 2nd
No ratings yet
cc15 2nd
2 pages
Jncis-Sp & Jncip-Sp Blueprint
No ratings yet
Jncis-Sp & Jncip-Sp Blueprint
4 pages
Data Mining
No ratings yet
Data Mining
25 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Data Mining Questions 1st Unit
No ratings yet
Data Mining Questions 1st Unit
6 pages
Functional Specification: Project SAP Support
No ratings yet
Functional Specification: Project SAP Support
7 pages
Important Questions
No ratings yet
Important Questions
26 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
DM Module1
No ratings yet
DM Module1
15 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
System Item - 800 - 300: Boeing 737-Ng
No ratings yet
System Item - 800 - 300: Boeing 737-Ng
12 pages
林肯power Wave 455m
No ratings yet
林肯power Wave 455m
64 pages
Solar Cell Efficiency Tables (Version 50)
No ratings yet
Solar Cell Efficiency Tables (Version 50)
9 pages
Assignment 4 Dig Tech Harith Aqasha 3AVM2
No ratings yet
Assignment 4 Dig Tech Harith Aqasha 3AVM2
3 pages
3GPP - Performance Management
No ratings yet
3GPP - Performance Management
90 pages
Computer Communication Network (Vtu) - 18ec71 Module - 1: TH ND
No ratings yet
Computer Communication Network (Vtu) - 18ec71 Module - 1: TH ND
96 pages
Nonlinear-Rotman Symposium 2016 Final Presentation
No ratings yet
Nonlinear-Rotman Symposium 2016 Final Presentation
58 pages
Chapter 17 Data Communication and Computer Networks
No ratings yet
Chapter 17 Data Communication and Computer Networks
57 pages
Europol Platform For Experts: For Secure Law Enforcement Knowledge Sharing
No ratings yet
Europol Platform For Experts: For Secure Law Enforcement Knowledge Sharing
6 pages
HIFREQ FFTSES Flow
No ratings yet
HIFREQ FFTSES Flow
1 page
835 Companion Guide
No ratings yet
835 Companion Guide
17 pages
Facebook Instagram Acquisition Closes
100% (1)
Facebook Instagram Acquisition Closes
10 pages
It-Eb Cia 3
No ratings yet
It-Eb Cia 3
16 pages
1234
No ratings yet
1234
1 page
Java Simple Program
No ratings yet
Java Simple Program
7 pages
CND Blueprint v3.0
No ratings yet
CND Blueprint v3.0
6 pages
Cpe009fa1 Guariño Danica Lab7-1
No ratings yet
Cpe009fa1 Guariño Danica Lab7-1
9 pages
Poster
No ratings yet
Poster
1 page
Brivo Unified Credential
No ratings yet
Brivo Unified Credential
2 pages
Analog Adjustable 2-Wire Transmitters: Apaq-H
No ratings yet
Analog Adjustable 2-Wire Transmitters: Apaq-H
1 page
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet

KDD-Knowledge Discovery in Databases

Uploaded by

KDD-Knowledge Discovery in Databases

Uploaded by

Knowledge Discovery in Databases

Knowledge discovery in databases (KDD) is the process of discovering useful

KDD includes multidisciplinary activities. This encompasses data storage and

Steps involved in the entire KDD process are:

Data cleaning is defined as removal of noisy and irrelevant data from

1. Cleaning in case of Missing values.

2. Cleaning noisy data, where noise is a random or variance error.

3. Cleaning with Data discrepancy detection and Data

Data integration is defined as heterogeneous data from multiple sources

Data Transformation is defined as the process of transforming data into

1. Data Mapping: Assigning elements from source base to destination

2. Code generation: Creation of the actual transformation program.

Data mining is defined as techniques that are applied to extract patterns

Pattern Evaluation is defined as identifying strictly increasing patterns

Note: KDD is an iterative process where evaluation measures can be

knowledge that can help organizations make better decisions.

2. Increased efficiency: KDD automates repetitive and

saves time and money.

3. Better customer service: KDD helps organizations gain a better

understanding of their customers’ needs and preferences, which can

help them provide better customer service.

4. Fraud detection: KDD can be used to detect fraudulent activities by

identifying patterns and anomalies in the data that may indicate

5. Predictive modeling: KDD can be used to build predictive models

that can forecast future trends and patterns.

collecting and analyzing large amounts of data, which can include

sensitive information about individuals.

2. Complexity: KDD can be a complex process that requires

specialized skills and knowledge to implement and interpret the

consequences, such as bias or discrimination, if the data or models

are not properly understood or used.

4. Data Quality: KDD process heavily depends on the quality of data,

if data is not accurate or consistent, the results can be misleading

5. High cost: KDD can be an expensive process, requiring significant

investments in hardware, software, and personnel.

You might also like