0% found this document useful (0 votes)

77 views29 pages

Knowledge Discovery in Databases

The document discusses the principles of knowledge discovery in databases (KDD). It describes KDD as the process of automatically extracting hidden and useful knowledge from large amounts of data. The key steps in the KDD process are building an understanding of the problem domain, selecting and preprocessing the relevant data, applying data mining algorithms to discover patterns, and interpreting and evaluating the discovered patterns to extract useful knowledge. The goal of KDD is to discover knowledge that can help make better decisions and take more informed actions.

Uploaded by

Ma. Jessabel Azurin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views29 pages

Knowledge Discovery in Databases

Uploaded by

Ma. Jessabel Azurin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Knowledge Discovery in

Databases
We Are Data Rich but
Information Poor

Databases are too big

Data Mining can help

discover knowledge

Terrorbytes
Principles of Knowledge Discovery in Data
What Is Our Need?

Extract interesting knowledge

(rules, regularities, patterns, constraints) from
data in large collections.

Knowledge

Data

Principles of Knowledge Discovery in Data

Data Collected

• Business transactions
• Scientific data (biology, physics, etc.)
• Medical and personal data
• Surveillance video and pictures
• Satellite sensing
• Games

Principles of Knowledge Discovery in Data

Data Collected (Con’t)

• Digital media
• CAD and Software engineering
• Virtual worlds
• Text reports and memos
• The World Wide Web

Principles of Knowledge Discovery in Data

What is Knowledge Discovery?
• Process of discovering valuable information from a collection of data,
or it is the process of converting raw data into useful information
• An activity that produces knowledge by discovering it or deriving it
from existing information
• Refers to overall process of discovering useful knowledge from data,
and data mining refers to a particular step in this process
Why do we need knowledge discovery
process?
• This is information age, day to day creates new data.
• Data overload creates various problems to us to search proper
information.
• Helps us to find accurate information
• There is an urgent need for a new generation of computational
theories and tools to assist humans in extracting useful information
from rapid growing volumes of digital data.
What kinds of data can be processed?
• Database
• Data warehouse
• Transactional data
• Other kinds of data- time related data, sequence data, data streams,
spatial data (maps), multimedia data, graph and networked data, Web
What is Knowledge Discovery in Databases
(KDD)
• It refers to the broad procedure of discovering knowledge in data and
emphasizes the high-level applications of specific Data Mining techniques.
• It is a field of interest to researchers in various fields, including artificial
intelligence, machine learning, pattern recognition, databases, statistics,
knowledge acquisition for expert systems, and data visualization.
•The main objective of the KDD process is to extract information from
data in the context of large databases. It does this by using Data Mining
algorithms to identify what is deemed knowledge.
•The Knowledge Discovery in Databases is considered as a programmed,
exploratory analysis and modeling of vast data repositories.
What is Knowledge Discovery in Databases
(KDD)
•KDD is the organized procedure of recognizing valid, useful, and
understandable patterns from huge and complex data sets.

•Data Mining is the root of the KDD procedure, including the inferring of
algorithms that investigate the data, develop the model, and find previously
unknown patterns.

•The model is used for extracting the knowledge from the data, analyze
the data, and predict the data.
The Challenge (Humans aren’t particularly well suited to finding
patterns in data Computers, on the other hand…)
510201889052120015394581990000000014198812294488219960816210000001010001000000011000031111100000
000010031302000000000000002020010000000000000000000000000000434388888888424243424333012202022200
001010010000000441000000001100000000000000000100000100000000000000000000000000000000000000000000
000001998102751020189606012002126940968000000159019980903379811998091731001000001000100000001100
003200020000001000000012399000000000000200222200313100312000000000000000042438888888888424342423
321212122220000001011000000244100000000010020000000000000000000010000000000000000000000000000000
000000000000000000199812305102018970203200018626929200000047091998021356971199802273100000100100
010000000001101100000020000100000000021011000100000000000100000000000010001100000001110033888822
223311323343330000001100000111010011001020001000000001000000001000000000000000000000000000000000
000000000000000000000000000000019981221510201899093020052008986730000019410199901127598119990126
310010001010001000000000111110111112201010000011123001001000000102100022000000000020000000000000
111334388884342424243424233000000111100000101100100002441000000000100200000001001010000000100000
000000000000000000000000000000000000000000001999052551020189912272009354051583000001448419970527
179711997061031000000101100100000001000003111201200001001001012000111100100001101001200000000000
100000000001010132438888888888224242433100000001002100001110010011230100000010000020001000000000
011000010000000010000010000000000000000100000000000000000199811175102018991227200935405158300000
144841997052717972199806163100000010110010000000110100311111121000100000202210012220220020221222
201000000000000000001010011003243434321324221424242330021002100001111011000001122310011000001000
00010000000000110000100000000100000100000000000000000000000000000000001998122351020190001
Context
• Where you stand on Data Mining depends on where you sit:
• A business user will be interested in efficiency and results, validity
may not be as important.
• A researcher clearly will be interested in a different type of results,
and validity will be important.
• A computer scientist may be interested in introducing new algorithms
or computational approaches and achieving improved results or more
efficient processing.
KDD: A Definition
KDD is the automatic extraction of non-obvious,
hidden knowledge from large volumes of data.

106-1012 bytes: What is the knowledge?

we never see the whole Then run Data
How to represent
data set, so will put it in Mining algorithms
and use it?
the memory of computers

9
Data, Information, Knowledge
We often see data as a string of bits, or numbers and
symbols, or “objects” which we collect daily.

Information is data stripped of redundancy, and reduced

to the minimum necessary to characterize the data.

Knowledge is integrated information, including facts and

their relations, which have been perceived, discovered,
or learned as our “mental pictures”.
Knowledge can be considered data at
a high level of abstraction and generalization.

10
The KDD Process
• The knowledge discovery process (illustrates in the given figure) is
iterative and interactive, comprises of nine steps.

• The process is iterative at each stage, implying that moving back to the
previous actions might be required.
• The process has many imaginative aspects in the sense that one can’t
present one formula or make a complete scientific categorization for the
correct decisions for each step and application type.

• Thus, it is needed to understand the process and the different

requirements and possibilities in each stage.
The KDD Process
•The process begins with determining the KDD objectives and ends with
the implementation of the discovered knowledge.
•At that point, the loop is closed, and the Active Data Mining starts.
Subsequently, changes would need to be made in the application domain.
For example, offering various features to cell phone users in order to reduce
churn. This closes the loop, and the impacts are then measured on the new
data repositories, and the KDD process again.
The KDD
Process
KDD Steps can be Merged
Data cleaning + data integration = data pre-processing
Data selection + data transformation = data consolidation

KDD Is an Iterative Process

Principles of Knowledge Discovery in Data

The KDD Process
1. Building up an understanding of the application domain
•This is the initial preliminary step.
•It develops the scene for understanding what should be done with the
various decisions like transformation, algorithms, representation, etc.
•The individuals who are in charge of a KDD venture need to understand and
characterize the objectives of the end-user and the environment in which
the knowledge discovery process will occur ( involves relevant prior
knowledge).
The KDD Process
2. Choosing and creating a data set on which discovery will be
performed
•Once defined the objectives, the data that will be utilized for the knowledge
discovery process should be determined. This incorporates discovering what
data is accessible, obtaining important data, and afterward integrating all
the data for knowledge discovery onto one set involves the qualities that will
be considered for the process. This process is important because of Data
Mining learns and discovers from the accessible data. This is the evidence
base for building the models.
The KDD Process
2. Choosing and creating a data set on which discovery will be
performed
•If some significant attributes are missing, at that point, then the entire
study may be unsuccessful from this respect, the more attributes are
considered. On the other hand, to organize, collect, and operate
advanced data repositories is expensive, and there is an arrangement
with the opportunity for best understanding the phenomena. This
arrangement refers to an aspect where the interactive and iterative
aspect of the KDD is taking place. This begins with the best available
data sets and later expands and observes the impact in terms of
knowledge discovery and modeling.
The KDD Process
3. Preprocessing and cleansing
•In this step, data reliability is improved. It incorporates data clearing, for
example, Handling the missing quantities and removal of noise or outliers. It
might include complex statistical techniques or use a Data Mining algorithm
in this context. For example, when one suspects that a specific attribute of
lacking reliability or has many missing data, at this point, this attribute could
turn into the objective of the Data Mining supervised algorithm.
•A prediction model for these attributes will be created, and after that,
missing data can be predicted. The expansion to which one pays attention to
this level relies upon numerous factors.
The KDD Process
4. Data Transformation
•In this stage, the creation of appropriate data for Data Mining is prepared and
developed. Techniques here incorporate dimension reduction( for example,
feature selection and extraction and record sampling), also attribute
transformation(for example, discretization of numerical attributes and
functional transformation). This step can be essential for the success of the
entire KDD project, and it is typically very project-specific.
•For example, studying the impact of advertising accumulation. However, if we
do not utilize the right transformation at the starting, then we may acquire an
amazing effect that insights to us about the transformation required in the next
iteration. Thus, the KDD process follows upon itself and prompts an
understanding of the transformation required.
The KDD Process
5. Data Mining (Extract Patterns or Models)
•We are now prepared to decide on which kind of Data Mining to use, for
example, classification, regression, clustering, etc. This mainly relies on the KDD
objectives, and also on the previous steps. There are two significant objectives in
Data Mining, the first one is a prediction, and the second one is the description.
•Prediction is usually referred to as supervised Data Mining, while descriptive
Data Mining incorporates the unsupervised and visualization aspects of Data
Mining. Most Data Mining techniques depend on inductive learning, where a
model is built explicitly or implicitly by generalizing from an adequate number of
preparing models. The fundamental assumption of the inductive approach is
that the prepared model applies to future cases. The technique also takes into
account the level of meta-learning for the specific set of accessible data.
The KDD Process
•6. Selecting the Data Mining algorithm
•Having the technique, we now decide on the strategies. This stage
incorporates choosing a particular technique to be used for searching
patterns that include multiple inducers.
•For each system of meta-learning, there are several possibilities of how it
can be succeeded. Meta-learning focuses on clarifying what causes a Data
Mining algorithm to be fruitful or not in a specific issue. Thus, this
methodology attempts to understand the situation under which a Data
Mining algorithm is most suitable. Each algorithm has parameters and
strategies of leaning, such as ten folds cross-validation or another division
for training and testing.
The KDD Process
•7. Utilizing the Data Mining algorithm
•At last, the implementation of the Data Mining algorithm is reached.
•In this stage, we may need to utilize the algorithm several times until a
satisfying outcome is obtained. For example, by turning the algorithms
control parameters, such as the minimum number of instances in a single
leaf of a decision tree.
The KDD Process
•8. Evaluation
•In this step, we assess and interpret the mined patterns, rules, and reliability
to the objective characterized in the first step. Here we consider the
preprocessing steps as for their impact on the Data Mining algorithm results.
•For example, including a feature in step 4, and repeat from there. This step
focuses on the comprehensibility and utility of the induced model. In this
step, the identified knowledge is also recorded for further use. The last step
is the use, and overall feedback and discovery results acquire by Data
Mining.
The KDD Process
9. Using the discovered knowledge
•Now, we are prepared to include the knowledge into another system for further
activity. The knowledge becomes effective in the sense that we may make
changes to the system and measure the impacts.
•The accomplishment of this step decides the effectiveness of the whole KDD
process. There are numerous challenges in this step, such as losing the
"laboratory conditions" under which we have worked.
•For example, the knowledge was discovered from a certain static depiction, it is
usually a set of data, but now the data becomes dynamic. Data structures may
change certain quantities that become unavailable, and the data domain might
be modified, such as an attribute that may have a value that was not expected
previously.

Forresters Bi Maturity Assessment Tool
No ratings yet
Forresters Bi Maturity Assessment Tool
6 pages
Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
Research Paper
No ratings yet
Research Paper
7 pages
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
NCSC-16 - 3. Project Report
87% (15)
NCSC-16 - 3. Project Report
30 pages
FTK Imager User Guide
No ratings yet
FTK Imager User Guide
34 pages
The Knowledge Discovery Process
100% (1)
The Knowledge Discovery Process
17 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Unit 3 Univariate Analysis
No ratings yet
Unit 3 Univariate Analysis
39 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
From Data Mining To Knowledge Discovery in Database
100% (1)
From Data Mining To Knowledge Discovery in Database
18 pages
A Survey On Data Mining
No ratings yet
A Survey On Data Mining
4 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
Chap6 Advanced Association Analysis
No ratings yet
Chap6 Advanced Association Analysis
85 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
CH 6
No ratings yet
CH 6
72 pages
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
No ratings yet
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
9 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
Web Mining
No ratings yet
Web Mining
53 pages
Best Practice in Database Development For Performance
No ratings yet
Best Practice in Database Development For Performance
14 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
2 pages
Density & Grid based clustering
100% (1)
Density & Grid based clustering
21 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
2 pages
Nearest Neighbour Algorithm
No ratings yet
Nearest Neighbour Algorithm
20 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Lesson Plan: Data Warehousing and Data Mining
No ratings yet
Lesson Plan: Data Warehousing and Data Mining
1 page
AMNA SHAHID - Docx MCQS
No ratings yet
AMNA SHAHID - Docx MCQS
8 pages
Theory Of Structured Parallel Programming Yong Wang pdf download
No ratings yet
Theory Of Structured Parallel Programming Yong Wang pdf download
45 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Big data aktu unit 3
No ratings yet
Big data aktu unit 3
90 pages
A Algorithm
No ratings yet
A Algorithm
22 pages
DM 1
No ratings yet
DM 1
78 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Object Relational DBMSs
No ratings yet
Object Relational DBMSs
34 pages
Implementation of Web Page Ranking Algorithms: Presented By
No ratings yet
Implementation of Web Page Ranking Algorithms: Presented By
15 pages
Data Mining Concept Description: Characterization and Comparison
No ratings yet
Data Mining Concept Description: Characterization and Comparison
14 pages
SQL Antipatterns
100% (1)
SQL Antipatterns
250 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Social Information Filtering
No ratings yet
Social Information Filtering
25 pages
Java Collections PDF
No ratings yet
Java Collections PDF
566 pages
Unit V
No ratings yet
Unit V
13 pages
Cp7029 Information Storage Management
100% (1)
Cp7029 Information Storage Management
1 page
Fsd Unit III
No ratings yet
Fsd Unit III
22 pages
Paper Ljupce Markusheski PHD
No ratings yet
Paper Ljupce Markusheski PHD
12 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
UNESCO Courses: Module On Knowledge Discovery and Data Mining
No ratings yet
UNESCO Courses: Module On Knowledge Discovery and Data Mining
28 pages
KDD Process in Data Mining - Javatpoint
No ratings yet
KDD Process in Data Mining - Javatpoint
10 pages
Chapter_1_-_Introduction_to_Knowledge_Discovery_in
No ratings yet
Chapter_1_-_Introduction_to_Knowledge_Discovery_in
18 pages
Ch1 Overview Kdd_ml
No ratings yet
Ch1 Overview Kdd_ml
23 pages
UNIT - 1 Data Mining
No ratings yet
UNIT - 1 Data Mining
16 pages
SIMS 422: Knowledge Inference Systems & Applications
No ratings yet
SIMS 422: Knowledge Inference Systems & Applications
28 pages
Ppt-DWDM Unit 3
No ratings yet
Ppt-DWDM Unit 3
106 pages
FINALGenderandschooluniformthesiswriteup March 242016 FINAL
No ratings yet
FINALGenderandschooluniformthesiswriteup March 242016 FINAL
79 pages
Evolution of Cloud Computing
No ratings yet
Evolution of Cloud Computing
3 pages
Cloud Computing Introduction 2023
No ratings yet
Cloud Computing Introduction 2023
16 pages
GEC 9 Module 3
No ratings yet
GEC 9 Module 3
18 pages
GEC 9 Module 2
No ratings yet
GEC 9 Module 2
10 pages
GEC 9 Module 1
No ratings yet
GEC 9 Module 1
8 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
Chap 1 Data Preprocessing
No ratings yet
Chap 1 Data Preprocessing
17 pages
Unit 2 Storage Organisation: 2.0 Introduction
No ratings yet
Unit 2 Storage Organisation: 2.0 Introduction
27 pages
Apache Spark Explanation
No ratings yet
Apache Spark Explanation
9 pages
05MemoryManagement 2012
No ratings yet
05MemoryManagement 2012
76 pages
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
Idoc For Functional
No ratings yet
Idoc For Functional
28 pages
Concurrency and Locking
No ratings yet
Concurrency and Locking
6 pages
Get Video Analysis Methodology and Methods Qualitative Audiovisual Data Analysis in Sociology 3rd Edition Hubert Knoblauch free all chapters
100% (2)
Get Video Analysis Methodology and Methods Qualitative Audiovisual Data Analysis in Sociology 3rd Edition Hubert Knoblauch free all chapters
35 pages
The Backbone of History
No ratings yet
The Backbone of History
648 pages
Sepa Credit Transfer Pain 001 001 03 XML File Structure July 2013
No ratings yet
Sepa Credit Transfer Pain 001 001 03 XML File Structure July 2013
28 pages
Analysing Social Work Communication Discourse in Practice 1st Edition Christopher Hall - Own the ebook now with all fully detailed content
100% (2)
Analysing Social Work Communication Discourse in Practice 1st Edition Christopher Hall - Own the ebook now with all fully detailed content
60 pages
473-499+ijmlrcai+2024
No ratings yet
473-499+ijmlrcai+2024
27 pages
BDA Exp-5
No ratings yet
BDA Exp-5
14 pages
Netstat Ref Unix
No ratings yet
Netstat Ref Unix
4 pages
Complete Step-By-Step Roadmap to Learn Data Engineering in 2025
No ratings yet
Complete Step-By-Step Roadmap to Learn Data Engineering in 2025
13 pages
Persistence Hibernate
No ratings yet
Persistence Hibernate
39 pages
Living in The It Era
No ratings yet
Living in The It Era
10 pages
I Year Results (2013-17 Batch)
No ratings yet
I Year Results (2013-17 Batch)
30 pages
Ridl Reviewer
No ratings yet
Ridl Reviewer
7 pages
Question 2
No ratings yet
Question 2
9 pages
Unit 9. Database Security
No ratings yet
Unit 9. Database Security
36 pages
LAS in Triple I April 30
No ratings yet
LAS in Triple I April 30
4 pages
SQL Ass 2
No ratings yet
SQL Ass 2
3 pages
Working Capital Management of Nepal Telecom Company Limited
No ratings yet
Working Capital Management of Nepal Telecom Company Limited
29 pages
Romlyn P. Magno: Students' Independent Learning Capacity, Orientations, and Engagement On Their Academic Performace in Science
No ratings yet
Romlyn P. Magno: Students' Independent Learning Capacity, Orientations, and Engagement On Their Academic Performace in Science
68 pages
Steps Involved in Research Process - Abridged
No ratings yet
Steps Involved in Research Process - Abridged
11 pages
Building Advanced Formula Data Actions
No ratings yet
Building Advanced Formula Data Actions
15 pages
Information Analyzer Column Analysis
No ratings yet
Information Analyzer Column Analysis
29 pages

Knowledge Discovery in Databases

Uploaded by

Knowledge Discovery in Databases

Uploaded by

Knowledge Discovery in

Databases are too big

Data Mining can help

Extract interesting knowledge

Principles of Knowledge Discovery in Data

Principles of Knowledge Discovery in Data

Principles of Knowledge Discovery in Data

106-1012 bytes: What is the knowledge?

Information is data stripped of redundancy, and reduced

Knowledge is integrated information, including facts and

• Thus, it is needed to understand the process and the different

KDD Is an Iterative Process

Principles of Knowledge Discovery in Data

You might also like