0% found this document useful (0 votes)
100 views5 pages

KDD-Knowledge Discovery in Databases

Knowledge Discovery in Databases (KDD) is a process for extracting useful knowledge from data, involving steps like data cleaning, integration, selection, transformation, mining, evaluation, and representation. KDD offers advantages such as improved decision-making and efficiency but also poses challenges like privacy concerns and high costs. It is an iterative process that requires careful handling of data quality and complexity to avoid unintended consequences.

Uploaded by

Jayesh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views5 pages

KDD-Knowledge Discovery in Databases

Knowledge Discovery in Databases (KDD) is a process for extracting useful knowledge from data, involving steps like data cleaning, integration, selection, transformation, mining, evaluation, and representation. KDD offers advantages such as improved decision-making and efficiency but also poses challenges like privacy concerns and high costs. It is an iterative process that requires careful handling of data quality and complexity to avoid unintended consequences.

Uploaded by

Jayesh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Knowledge Discovery in Databases

Knowledge discovery in databases (KDD) is the process of discovering useful


knowledge from a collection of data. This widely used data mining technique is a
process that includes data preparation and selection, data cleansing, incorporating
prior knowledge on data sets and interpreting accurate solutions from the observed
results.

KDD includes multidisciplinary activities. This encompasses data storage and


access, scaling algorithms to massive data sets and interpreting results. The data
cleansing and data access process included in data warehousing facilitate the KDD
process. Artificial intelligence also supports KDD by discovering empirical laws from
experimentation and observations. The patterns recognized in the data must be valid
on new data, and possess some degree of certainty. These patterns are considered
new knowledge.

Steps involved in the entire KDD process are:

1. Data Cleaning

Data cleaning is defined as removal of noisy and irrelevant data from


collection.

1. Cleaning in case of Missing values.

2. Cleaning noisy data, where noise is a random or variance error.

3. Cleaning with Data discrepancy detection and Data

transformation tools.
2. Data Integration

Data integration is defined as heterogeneous data from multiple sources


combined in a common source(DataWarehouse). Data integration using Data
Migration tools, Data Synchronization tools and
ETL(Extract-Load-Transformation) process.

3. Data Selection

Data selection is defined as the process where data relevant to the analysis is
decided and retrieved from the data collection. For this we can use Neural
network, Decision Trees, Naive bayes, Clustering, and Regression methods.

4. Data Transformation

Data Transformation is defined as the process of transforming data into


appropriate form required by mining procedure. Data Transformation is a two
step process:

1. Data Mapping: Assigning elements from source base to destination

to capture transformations.

2. Code generation: Creation of the actual transformation program.

5. Data Mining

Data mining is defined as techniques that are applied to extract patterns


potentially useful. It transforms task relevant data into patterns, and decides
purpose of model using classification or characterization.

6. Pattern Evaluation

Pattern Evaluation is defined as identifying strictly increasing patterns


representing knowledge based on given measures. It find interestingness
score of each pattern, and uses summarization and Visualization to make
data understandable by user.

7. Knowledge Representation

This involves presenting the results in a way that is meaningful and can be
used to make decisions.

Note: KDD is an iterative process where evaluation measures can be


enhanced, mining can be refined, new data can be integrated and
transformed in order to get different and more appropriate
results.Preprocessing of databases consists of Data cleaning and Data
Integration.

Advantages of KDD
1. Improves decision-making: KDD provides valuable insights and

knowledge that can help organizations make better decisions.

2. Increased efficiency: KDD automates repetitive and

time-consuming tasks and makes the data ready for analysis, which

saves time and money.

3. Better customer service: KDD helps organizations gain a better

understanding of their customers’ needs and preferences, which can

help them provide better customer service.

4. Fraud detection: KDD can be used to detect fraudulent activities by

identifying patterns and anomalies in the data that may indicate

fraud.

5. Predictive modeling: KDD can be used to build predictive models

that can forecast future trends and patterns.

Disadvantages of KDD
1. Privacy concerns: KDD can raise privacy concerns as it involves

collecting and analyzing large amounts of data, which can include

sensitive information about individuals.

2. Complexity: KDD can be a complex process that requires

specialized skills and knowledge to implement and interpret the

results.
3. Unintended consequences: KDD can lead to unintended

consequences, such as bias or discrimination, if the data or models

are not properly understood or used.

4. Data Quality: KDD process heavily depends on the quality of data,

if data is not accurate or consistent, the results can be misleading

5. High cost: KDD can be an expensive process, requiring significant

investments in hardware, software, and personnel.

You might also like