Knowledge Discovery and Data Mining

knowledge discovery in databases (KDD) plays an important role in large organisation where data is store in large base.it help with exploring and understanding very large data set and building predictive model. This is the task-oriented process it been to identifying valid useful and understandable pattern from large and complex data set .data mining is the core of KDD process in KDD process interring the algorithm for extracting useful information the model purpose is understanding analysis.

Uploaded by

IJRASETPublications

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views5 pages

Knowledge Discovery and Data Mining

Uploaded by

IJRASETPublications

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

8 X October 2020

https://doi.org/10.22214/ijraset.2020.32045
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 8 Issue X Oct 2020- Available at www.ijraset.com

Knowledge Discovery and Data Mining

Siddharth Nandakumar Chikalkar
Bachelor of Computer Application, Vivekanand collage, Kolhapur

Abstract: knowledge discovery in databases (KDD) plays an important role in large organisation where data is store in large
base.it help with exploring and understanding very large data set and building predictive model. This is the task-oriented process
it been to identifying valid useful and understandable pattern from large and complex data set .data mining is the core of KDD
process in KDD process interring the algorithm for extracting useful information the model purpose is understanding analysis
and prediction. Increasing growth of every sector produce data and helping of this model we recognize the pattern and trend in
large data sets in sector.

I. INTRODUCTION
Data science is the field which every field is needed. every day data is producing rapidly and this data have to handle in every day
for increasing productivity .data mining is the incorporation of quantitative methods or mathematical method that may include
mathematical equation algorithms some your prominent methodologies are tradition logistic regression neural network segmentation
classification clustering those are all method that utilize mathematics .data mining is applicable across industry sectors generally
wherever you have processes wherever you have data it is the application of those powerful mathematical techniques in core
incorporation with some statistical type of inference testing they call it that will extract trends and patterns there data mining is use.
Basically, data mining is the process where the raw data turn into useful information .it has many phares to analyse data and extract
useful information .in this paper we see all of those steps in KDD means knowledge discovery in database. KDD is the process of
finding knowledge in large data base it is the procedure of the data mining.

II. WHY WE NEED DATA MINING

Everyday volume of information is increasing rapidly and we handle business transaction, sensor data, scientific data videos picture
etc. evolution of technology increasing production of data every day. That’s why the explosive growth of data from terabytes to
petabytes. Data availability has been easily like form automated data collection tool, data base system, web computerised society
data can available in large amount. And we have data form business like web e-commerce, transaction, stock etc. form science we
also got remote sensing data bioinformatic scientific simulation etc. and mainly we got a lot of data from society and everyone. like
news, cameras, YouTube, social media platform like Instagram, Facebook, twitter, snapchat any many more. So, we need some kind
of system that will capable of extracting essence of information available and that can automatically generate report. views or
summery of data for decision making.

Figure: knowledge discovery in Database

III. KNOWLEDGE DISCOVERY PROCESS

The KDD process is used in large data set to identify pattern and trend there is many phrases of this process it is the traditional
method of turning data into knowledge relies on manual analysis and interpretation. The process starts with the KDD goals, and
ends with the implementation of the discovered knowledge. the process is iterative at each step, meaning that any step can moving
back to previous steps may be required. First in this process understand the goal of end-user. And then the process is beginning with
data cleaning.

International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 8 Issue X Oct 2020- Available at www.ijraset.com

A. Data Cleaning
This is the first process of data mining the Data cleaning is defined as removal of noisy and irrelevant data from collection. We got
the data for data mining from multiple sources so some kind of data may irrelevant to the data mining process so in this step we
clean the data and extract the relevant data from all source we have. There Is different type of source of data that are used in data
mining process. The data from multiple sources are integrated into a common source known as Data Warehouse. and the data which
we got for mining it would be flat files means the data file in text form or binary form which easily extract by data mining
algorithm, relational databases in this type we got data in rows and column physical schema in relational data base define the
structure of the table and logical schema define the relationship among the table. And next is transaction data bases in this type of
source we can get the data organize by time date and stamps to represent the transaction in data base. This type of data base capable
to roll back or undo operation when a transaction is not complected or committed. Next is multimedia data base this type of data
base consist audio, video image and text media. they can store in object-oriented data base. Next is spatial data bases in this type of
data base we can get the geographical information. Next is time series data where we can get the stock exchange data user logged
data .and last is www means world wide web is the collection of audio video text etc which is identified uniform resource locator
through web browser. This is the all type of source we gather the data for data mining.

B. Data Pre-processing
In this stage of data mining where multiple data source is combined. After data cleaning we got the data from various source so here
we integrate those data. then the only those data will be retrieved from data base which is relevant from analysis task. Then we got
the data in consist state for applying algorithm.in this data pre-processing we arises problem which is some data is missing from
data .so we have to fill missing value there are the various way to do this task. We can choose to fill value manually, by attribute
mean and most probable value. And regression Here data can be made smooth by fitting it to a regression function. The regression
used may be (having one independent variable) or multiple (having multiple independent variables). after data pre-processing the
data which is extracted, this data is also important is describe useful information this data is used to help an organization to decision
making because this data is integrated data from one or more disparate source.

C. Data Transformation
In the data transformation data transformed to appropriate form for data mining. there is different step for data transformation the
first step is smoothing, in this process the noise of data will be eliminate by some algorithm. and we can highlight some important
features in the data set.it help in predicting pattern. And after smoothing the data we can identify the simple change to predict
different trend and pattern. The next step in data transformation is aggregation. here the data is store in summery format. the data
integrate into data analysis discerption. this collection of data is useful from everything for decision concerning, strategy, product
prising, operations and marketing strategy. after that the discretization process has been proceed, here transforming the continuous
data into set of small intervals. Because the data mining activities required the continues attribute .data mining task can manage the
continuous attribute.it can improve efficiency by replacing the constant quality attribute with discrete values so its transformed data
in set of small intervals like (1-10,11-20) .one of data transformation procedure is normalization. this procedure involves converting
all data variable into given range. It generally required when we are dealing with attribute with different scale. There is some
method for data normalization which is decimal scaling method, min-max normalization and z-score normalization. All of this the
data ready to data mining

D. Data Mining
This process is important now we have to decide which type of data mining to use for example regression or clustering .in this
process the useful pattern been extracted from data it is intelligent method are applied in order to extract useful information from
transformed data .and the pattern are extracted by algorithm .in data mining the algorithm use like c4.5 ,k-mean, algorithm
,expectation-maximization this kind of algorithm used in data mining .k-mean and expectation maximization generally use in data
mining process of KDD.

IV. PATTERN EVALUATION

in this stage is identify patterns obtain in data mining pattern been convert in knowledge here use summarization and visualization
techniques to make data understand by user.in this stage of knowledge discovery the pattern and trend are have been identified. And
this useful information has been representing for strategy and prediction.

V. KNOWLEDGE REPRESENTATION
Knowledge representation is defined as technique which utilizes visualization tools to represent data mining results. from
generating report generating tables generate discriminate rules and classification rules or characterization rules etc.

VI. CONCLUSION
The object of this research paper is to study the KDD process. in this paper we present the different phrase in knowledge discovery
process. The KDD process is one of the best way to finding trend and pattern in large data set. we provide which type of algorithm
were used in data mining which is core of KDD process. And how data transformation process happened. The main advantage of
the integrated approach is that the pre-processing steps are much easier and more convenient for data mining. Data pre-processing
and data transformation is also important phrase in KDD and this phrase are very challenging to extract task relevant and useful data

REFERENCE
[1] Dehaspe, L., Toivonen, H., Discovery of frequent Datalog patterns. Data Mining and Knowledge Discovery, 3:7-36, 1999.
[2] Introduction to Data Mining and Knowledge Discovery, Third Edition ISBN: 1-892095-02-5, Two Crows Corporation, 10500 Falls Road, Potomac, MD 20854
(U.S.A.), 1999.
[3] RꞏGrothꞏHouDi.Data Mining - Building Competitive Advantages of Enterprises[M]. Xi'an:Xi'an Jiaotong University press,2001.
[4] YangJingfang.The application of machine learning algorithm in data mining[J].Electronic Technology & Software Engineering,2018(04):1
[5] ChenXiao.Application of machine learning algorithm in data mining[J].Modern Electronics Technique,2015,38(20):11-14
[6] L. Soibelman, M. Asce, K. Hyunjoo, Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases, J.
Computing In Civil Engineering : (January 2002).
[7] ] M. J. A. Berry, G. Linoff, Data mining: for Marketing, sales, and customer Support, John Wiley and Sons (Publish.): (1997).

DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Ia1 2020
No ratings yet
Ia1 2020
13 pages
RenckJalongo-Saracho2016 Chapter FromAResearchProjectToAJournal
No ratings yet
RenckJalongo-Saracho2016 Chapter FromAResearchProjectToAJournal
23 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Data Mining
No ratings yet
Data Mining
6 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
Er 1
100% (3)
Er 1
38 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
An Overview of Business Intelligence, Analytics, and Data Science
40 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Discovering Statistics Using IBM SPSS Stat - Andy Field
No ratings yet
Discovering Statistics Using IBM SPSS Stat - Andy Field
2 pages
Basic Data Mining Tasks
No ratings yet
Basic Data Mining Tasks
12 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Crypt DB
100% (1)
Crypt DB
28 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
BIS 541 Ch01 20-21 S
No ratings yet
BIS 541 Ch01 20-21 S
129 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
IoT-Based Smart Medicine Dispenser
100% (1)
IoT-Based Smart Medicine Dispenser
8 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
62 pages
Data Mining e Resources
No ratings yet
Data Mining e Resources
98 pages
CryptoDrive A Decentralized Car Sharing System
100% (1)
CryptoDrive A Decentralized Car Sharing System
9 pages
Cap481 - Business Communication Unit 4
No ratings yet
Cap481 - Business Communication Unit 4
90 pages
DE Unit1 - Introdcution - DE - 8jul24
No ratings yet
DE Unit1 - Introdcution - DE - 8jul24
56 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Data Knowledge
No ratings yet
Data Knowledge
44 pages
Manual Honeywell 3200
No ratings yet
Manual Honeywell 3200
101 pages
Unit 1
No ratings yet
Unit 1
43 pages
Introduction
No ratings yet
Introduction
27 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Structural Analysis of The Performance of The Diagrid System With and Without Shear Wall
No ratings yet
Structural Analysis of The Performance of The Diagrid System With and Without Shear Wall
13 pages
Data Mining
No ratings yet
Data Mining
25 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Adsorption Study On Waste Water Characteristics by Using Natural Bio-Adsorbents
No ratings yet
Adsorption Study On Waste Water Characteristics by Using Natural Bio-Adsorbents
6 pages
1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
DWM 4
No ratings yet
DWM 4
23 pages
Data Mining
No ratings yet
Data Mining
15 pages
Dark Store E-Commerce Website Using Sentiment Analysis Prediction
No ratings yet
Dark Store E-Commerce Website Using Sentiment Analysis Prediction
6 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Design and Analysis of Fixed Brake Caliper Using Additive Manufacturing
No ratings yet
Design and Analysis of Fixed Brake Caliper Using Additive Manufacturing
9 pages
Study and Analysis of Non-Newtonian Fluid Speed Bump
No ratings yet
Study and Analysis of Non-Newtonian Fluid Speed Bump
8 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Smart Parking System Using MERN Stack
No ratings yet
Smart Parking System Using MERN Stack
6 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
Unit 4 BDTT
No ratings yet
Unit 4 BDTT
23 pages
Business Support System For Local Stores
No ratings yet
Business Support System For Local Stores
8 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Unit 3
No ratings yet
Unit 3
23 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
Geo Analytical - Question Answering
No ratings yet
Geo Analytical - Question Answering
15 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Employment 5 0 The Work of The Future and The Future - 2022 - Technology in Soc
No ratings yet
Employment 5 0 The Work of The Future and The Future - 2022 - Technology in Soc
15 pages
Slide 03 Chapter1 Introduction
No ratings yet
Slide 03 Chapter1 Introduction
36 pages
Design and Analysis of Components in Off-Road Vehicle
No ratings yet
Design and Analysis of Components in Off-Road Vehicle
23 pages
Different Types of Data Analysis - Data Analysis Methods and Techniques in Research Projects
No ratings yet
Different Types of Data Analysis - Data Analysis Methods and Techniques in Research Projects
9 pages
String and String-Handling Instructions
No ratings yet
String and String-Handling Instructions
12 pages
11 V May 2023
No ratings yet
11 V May 2023
34 pages
8 Revision Handout
No ratings yet
8 Revision Handout
17 pages
A STUDY ON Employee Career Planning
No ratings yet
A STUDY ON Employee Career Planning
14 pages
Topology Optimisation of Piston
No ratings yet
Topology Optimisation of Piston
8 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Introduction
No ratings yet
Introduction
26 pages
Comparative in Vivo Study On Quality Analysis On Bisacodyl of Different Brands
No ratings yet
Comparative in Vivo Study On Quality Analysis On Bisacodyl of Different Brands
17 pages
Data Warehousing & Mining: Unit - Iv
No ratings yet
Data Warehousing & Mining: Unit - Iv
32 pages
Section A Objective Questions (50 Marks) Instruction:: Confidential
No ratings yet
Section A Objective Questions (50 Marks) Instruction:: Confidential
19 pages
Ankit STRV Report - PDF 2
No ratings yet
Ankit STRV Report - PDF 2
6 pages
Real Time Human Body Posture Analysis Using Deep Learning
100% (1)
Real Time Human Body Posture Analysis Using Deep Learning
7 pages
LIB101 Chapter 1
No ratings yet
LIB101 Chapter 1
19 pages
A Brief Overview On Data Mining Survey PDF
No ratings yet
A Brief Overview On Data Mining Survey PDF
8 pages
Ebook Mql5
No ratings yet
Ebook Mql5
22 pages
cc15 2nd
No ratings yet
cc15 2nd
2 pages
Controlled Hand Gestures Using Python and OpenCV
No ratings yet
Controlled Hand Gestures Using Python and OpenCV
7 pages
Visualspc: Complete Shop Floor Quality Data Collection & Analysis Solution
No ratings yet
Visualspc: Complete Shop Floor Quality Data Collection & Analysis Solution
8 pages
Lab Manual 09 PDF
No ratings yet
Lab Manual 09 PDF
8 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
4 Data Mining Techniquesin Association Rule
No ratings yet
4 Data Mining Techniquesin Association Rule
4 pages
Group 2 Peta Survey
No ratings yet
Group 2 Peta Survey
5 pages
01 Intro
No ratings yet
01 Intro
23 pages
Image Detection and Real Time Object Detection
100% (1)
Image Detection and Real Time Object Detection
8 pages
DBMS Project
No ratings yet
DBMS Project
26 pages
Example File - Sample (Dummy) Files
No ratings yet
Example File - Sample (Dummy) Files
7 pages
Air Conditioning Heat Load Analysis of A Cabin
No ratings yet
Air Conditioning Heat Load Analysis of A Cabin
9 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
BCA 205 B Fundamental of DBMS
No ratings yet
BCA 205 B Fundamental of DBMS
4 pages
Artikel Efa.
No ratings yet
Artikel Efa.
5 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
BIM Data Analysis and Visualization Workflow
No ratings yet
BIM Data Analysis and Visualization Workflow
7 pages
A Brief Survey: Data Mining Techniques and Application On Selected Sectors
No ratings yet
A Brief Survey: Data Mining Techniques and Application On Selected Sectors
5 pages
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
No ratings yet
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
5 pages
Se of Optimism Software To Observe Effect of Different Sources in Optical Fiber
No ratings yet
Se of Optimism Software To Observe Effect of Different Sources in Optical Fiber
7 pages
Role of Artificial Intelligence in Emotion Recognition
No ratings yet
Role of Artificial Intelligence in Emotion Recognition
5 pages
Design and Analysis of Fixed-Segment Carrier at Carbon Thrust Bearing
No ratings yet
Design and Analysis of Fixed-Segment Carrier at Carbon Thrust Bearing
10 pages
Study and Analysis of Non-Newtonian Fluid Speed Bump
No ratings yet
Study and Analysis of Non-Newtonian Fluid Speed Bump
8 pages
Fund Future Empowering The Crowdfunding
No ratings yet
Fund Future Empowering The Crowdfunding
6 pages
Advanced Wireless Multipurpose Mine Detection Robot
No ratings yet
Advanced Wireless Multipurpose Mine Detection Robot
7 pages
QlikSense-sample-resumes - 1
No ratings yet
QlikSense-sample-resumes - 1
3 pages
Clustering Algorithm For Spatial Data Mining: An: A.Padmapriya, N.Subitha
No ratings yet
Clustering Algorithm For Spatial Data Mining: An: A.Padmapriya, N.Subitha
6 pages
Mca 1 Sem Database Systems 1c8114 2022
No ratings yet
Mca 1 Sem Database Systems 1c8114 2022
2 pages
Ate Jack Peckson-Precious Tapales
No ratings yet
Ate Jack Peckson-Precious Tapales
16 pages
Low Cost Scada System For Micro Industry
No ratings yet
Low Cost Scada System For Micro Industry
5 pages
Skill Verification System Using Blockchain SkillVio
No ratings yet
Skill Verification System Using Blockchain SkillVio
6 pages
TNP Portal Using Web Development and Machine Learning
No ratings yet
TNP Portal Using Web Development and Machine Learning
9 pages
Credit Card Fraud Detection Using Machine Learning and Blockchain
100% (1)
Credit Card Fraud Detection Using Machine Learning and Blockchain
9 pages
Triple I's Reviewer 3RD Quarter
No ratings yet
Triple I's Reviewer 3RD Quarter
4 pages
p196 - Knowledge Discovery in Databases
No ratings yet
p196 - Knowledge Discovery in Databases
8 pages
Data Mining For Humanity: An Overview
No ratings yet
Data Mining For Humanity: An Overview
4 pages
Pneumonia Detection Using X-Rays by Deep Learning
No ratings yet
Pneumonia Detection Using X-Rays by Deep Learning
6 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
SQL Normalization
No ratings yet
SQL Normalization
4 pages

Knowledge Discovery and Data Mining

Uploaded by

Knowledge Discovery and Data Mining

Uploaded by

8 X October 2020

Knowledge Discovery and Data Mining

II. WHY WE NEED DATA MINING

Figure: knowledge discovery in Database

III. KNOWLEDGE DISCOVERY PROCESS

©IJRASET: All Rights are Reserved 874

IV. PATTERN EVALUATION

©IJRASET: All Rights are Reserved 875

©IJRASET: All Rights are Reserved 876

You might also like