0% found this document useful (0 votes)

22 views9 pages

DMjoy

Knowledge discovery in databases (KDD) is an iterative process that involves cleaning, integrating, selecting, transforming, mining, evaluating, and representing data from databases to extract useful patterns. It transforms task-relevant data into patterns and decides the purpose of the model using classification or characterization. The results are then presented in a way that is meaningful and can be used to make decisions.

Uploaded by

amitsinghofficial11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views9 pages

DMjoy

Uploaded by

amitsinghofficial11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Q.1 ) How Knowledge Discovery is done From Database?

Ans: Knowledge Discovery in Databases (KDD) is a process that

involves the extraction of useful, previously unknown, and potentially
valuable information from large datasets1. The KDD process is an
iterative process and it requires multiple iterations of the following
steps to extract accurate knowledge from the data1:
1. Data Cleaning: This involves the removal of noisy and irrelevant data
from the collection1. It includes cleaning in case of missing values,
cleaning noisy data (where noise is a random or variance error), and
cleaning with data discrepancy detection and data transformation
tools1.
2. Data Integration: This step involves combining heterogeneous data
from multiple sources into a common source (Data Warehouse)1. It
uses Data Migration tools, Data Synchronization tools, and the ETL
(Extract-Load-Transformation) process1.
3. Data Selection: This is the process where data relevant to the analysis
is decided and retrieved from the data collection1. Methods such as
Neural networks, Decision Trees, Naive Bayes, Clustering, and
Regression can be used for this purpose1.
4. Data Transformation: This involves transforming data into an
appropriate form required by the mining procedure1. Data
Transformation is a two-step process: Data Mapping (assigning
elements from source base to destination to capture transformations)
and Code generation (creation of the actual transformation program)1.
5. Data Mining: This involves applying techniques to extract potentially
useful patterns1. It transforms task-relevant data into patterns and
decides the purpose of the model using classification or
characterization1.
6. Pattern Evaluation: This involves identifying strictly increasing
patterns representing knowledge based on given measures1. It finds
the interestingness score of each pattern, and uses summarization and
visualization to make data understandable by the user1.
7. Knowledge Representation: This involves presenting the results in a
way that is meaningful and can be used to make decisions1

Q.2)What kind of data can be mined?

ANS:

1. Relational Databases: A database system that includes a set of

interrelated data, called a database, and a set of software programs to
handle and access the data2.
2. Transactional Databases: A database that includes a file where each
record defines a transaction2.
3. Object-Relational Databases: Databases assembled based on an
object-relational data model2.
4. Temporal Database: A database that generally stores relational data
that contains time-related attributes2.
5. Sequence Database: A database that stores sequences of ordered
events, with or without a factual idea of time2.
6. Time-Series Database: A database that stores sequences of values or
events accessed over repeated measurements of time (e.g., hourly,
daily, weekly)2.
7. Multimedia Data: This includes image data, video data, audio data,
website hyperlinks, and linkages3.
8. Web Data: Web mining is essential to discover crucial patterns and
knowledge from the Web3.
9. Text Data: Text mining is the subfield of data mining, machine
learning, Natural Language processing, and statistics3.
Q.3)What do you understand by central tendency of data?

Ans:

In data mining, the central tendency of a dataset is a measure that

determines the “center” or average value of the dataset. There are
three common measures of central tendency:
1. Mean: This is the sum of all values divided by the total number of
values12. It can be calculated for both ungrouped and grouped
data1. For ungrouped data, it’s the sum of all observations divided by
the total number of observations1. For grouped data, it’s the sum of
the product of observations and their corresponding frequencies
divided by the sum of all frequencies1.
2. Median: This is the middle number in an ordered dataset2. If the
dataset is even, it’s the average of the two middle numbers.
3. Mode: This is the most frequent value in the dataset2. A dataset may
have one mode, more than one mode, or no mode at all.

Q.4)Data charactersation and Data Discrimination.

Ans:

Data characterization and data discrimination are two data analysis

techniques that are related to classification .
12

Data characterization is the

process of summarizing the data of a class under study, called the
target class .
1

Example- A data mining query for characterization. Suppose

that a user wants to describe the general characteristics of
graduate students in the Big University database, given the
attributes name, gender, major, birth place, birth date,
residence, phone# (telephone number), and gpa (grade point
average) .

Data discrimination is the process of comparing the data of the

target class to that of other classes, called the contrasting classes, and
finding the features that distinguish them Cluster analysis is a
1

popular data discretization/discrimination method.

Data
discretization and concept hierarchy generation are also
forms of data reduction. The raw data are replaced by a
smaller number of interval or concept labels. This simplifies
the original data and makes the mining more efficient.

Example -A clustering algorithm can be applied to discretize

a numeric attribute, A, by partitioning the values of A into
clusters or groups. Clustering takes the distribution of A into
consideration, as well as the closeness of data points, and
therefore is able to produce high-quality discretization
results.

Q.5)What do you mean by Supervised and Unsupervised

Learning?

Ans: Supervised learning -> is basically a synonym for

classification. The supervision in the learning comes from
the labeled examples in the training data set. For example,
in the postal code recognition problem, a set of handwritten
postal code images and their corresponding machine-
readable translations are used as the training examples,
which supervise the learning of the classification model.

Unsupervised learning-> is essentially a synonym for

clustering. The learning process is unsupervised since the
input examples are not class labeled. Typically, we may use
clustering to discover classes within the data. For example,
an unsupervised learning method can take, as input, a set of
images of handwritten digits. Suppose that it finds 10
clusters of data. These clusters may correspond to the 10
distinct digits of 0 to 9, respectively. However, since the
training data are not labeled, the learned model cannot tell
us the semantic meaning of the clusters found.

Q6. What is data warehousing?

Ans: A data warehouse integrates data originating from

multiple sources and various timeframes. It consolidates
data in multidimensional space to form partially materialized
data cubes. A Data Warehouse (DW) is a relational database
that is designed for query and analysis rather than
transaction processing1. It is a centralized data repository
which can be queried for business benefits1. It includes
historical data derived from transaction data from single and
multiple sources1. A Data Warehouse provides integrated,
enterprise-wide, historical data and focuses on providing
support for decision-makers for data modeling and analysis.

Q.7) Types of machine learning?

Ans:
1.Supervised Machine Learning: This type of learning is defined
as when a model gets trained on a “Labelled Dataset”. Labelled
datasets have both input and output parameters. In Supervised
Learning, algorithms learn to map points between inputs and correct
outputs1. It has two main categories:

o Classification: Classification algorithms are used to solve problems in

which the output variable is categorical2.
o Regression: Regression algorithms are used when the output is a real

or continuous value1.
2. Unsupervised Machine Learning: This type of learning is where the
model is trained on an unlabelled dataset and the model has to find
patterns within this data1.
3. Semi-Supervised Machine Learning: This type of learning falls
between supervised and unsupervised learning. It uses both labelled
and unlabelled data for training1.
4. Reinforcement Learning: This type of learning is about taking
suitable action to maximize reward in a particular situation1.
Q.8 Major issues in data mining?

Ans:

Data mining is a complex process and it faces several issues. Here are
the major ones:
1. Mining Methodology and User Interaction Issues12:
o Mining different kinds of knowledge in databases: Different users may

be interested in different kinds of knowledge, making it necessary for

data mining to cover a broad range of knowledge discovery tasks12.
o Interactive mining of knowledge at multiple levels of abstraction: The
data mining process needs to be interactive because it allows users to
focus the search for patterns, providing and refining data mining
requests based on the returned results1

2. Data Security & Privacy3: Ensuring that sensitive information is

not mined or misused can pose challenges.

 3. Efficiency and scalability of data mining algorithms: In order to

effectively extract the information from a huge amount of data in
databases, data mining algorithms must be efficient and scalable12.
 Parallel, distributed, and incremental mining algorithms: The factors
such as huge size of databases, wide distribution of data, and
complexity of data mining methods motivate the development of
parallel and distributed data mining algorithms1.
Q.9 what do you mean by nominal and ordinal
attributes?

Ans->

Nominal and ordinal attributes are two types of categorical data used in statistics and
data analysis. They differ in the level of measurement and the type of operations that
can be performed on them12345.

Nominal Attributes:

 Nominal data, also known as categorical data, are used for labeling variables
without any quantitative value1.
 The name ‘Nominal’ comes from the Latin word “nomen” which means
‘name’1.
 Nominal data are those items which are distinguished by a simple naming
system1.
 They are data with no numeric value, such as profession1.
 The values grouped into these categories have no meaningful order1.
 For example, gender and occupation are nominal level values1.

Ordinal Attributes:

 Ordinal data is a type of categorical data with an order (or rank) among the
values2.
 The Ordinal Attributes contains values that have a meaningful sequence or
ranking (order) between them, but the magnitude between successive values is
not known2.
 The order of values shows what is important but doesn’t indicate how
important it is2.

In summary, the main difference between nominal and ordinal data lies in whether
there is an order or rank to the categories. Nominal data categories do not have a
standard order, while ordinal data categories do have a clear ordering5.
Q.What is five number theory and its use.

Ans:

The Five Number Summary is a method used in descriptive statistics that provides a
summary of a set of data. It consists of the following five numbers123:

1. Minimum: The smallest number in the dataset2.

2. First Quartile (Q1): The middle number between the minimum and the
median. This is also known as the lower quartile2.
3. Median (Q2): The middle value of the dataset. If the dataset has an even
number of observations, the median is the average of the two middle numbers2.
4. Third Quartile (Q3): The middle value between the median and the
maximum. This is also known as the upper quartile2.
5. Maximum: The largest number in the dataset2.

The Five Number Summary is useful because it provides a concise summary of the
distribution of data in several ways145:

 It tells us where the middle value is located, using the median45.

 It tells us how spread out the data is, using the first and third quartiles45.
 It tells us the range of the data, using the minimum and maximum45.

These five statistics provide similar types of information as other statistics while
having advantages over them. They are less sensitive to skewed distributions and
outliers, making them more robust than other measures like mean and standard
deviation1. This makes them extremely helpful for analyses performed when you’re
just starting to understand your data1. They are valid with continuous and ordinal data,
giving you greater flexibility1.

Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Data Mining Functionalities
100% (1)
Data Mining Functionalities
4 pages
Computer Science Thesis Writing
100% (3)
Computer Science Thesis Writing
6 pages
DATA Warehousing Quiz
No ratings yet
DATA Warehousing Quiz
9 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
2 Data Mining
No ratings yet
2 Data Mining
20 pages
DBMS Notes Unit-I (R23)
No ratings yet
DBMS Notes Unit-I (R23)
33 pages
The Absolutely Best Books For PyQt5 Beginners
No ratings yet
The Absolutely Best Books For PyQt5 Beginners
2 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Unit 4 Intro DM
No ratings yet
Unit 4 Intro DM
30 pages
Unit 4 Data Warehousing and Data Mining
No ratings yet
Unit 4 Data Warehousing and Data Mining
15 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
DW&M Unit - 1-Imp Vii Sem
No ratings yet
DW&M Unit - 1-Imp Vii Sem
9 pages
Unit 1
No ratings yet
Unit 1
21 pages
Module 1 - Introduction To Business Analytics
No ratings yet
Module 1 - Introduction To Business Analytics
62 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
Soln 1
100% (1)
Soln 1
6 pages
Data Mining
No ratings yet
Data Mining
254 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
Unit 1
No ratings yet
Unit 1
102 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
DMDW Unit1
No ratings yet
DMDW Unit1
31 pages
Unit 1
No ratings yet
Unit 1
59 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Unit 1
No ratings yet
Unit 1
43 pages
FDS - I Unit
No ratings yet
FDS - I Unit
9 pages
DWM Module 2
No ratings yet
DWM Module 2
122 pages
Lec 1
No ratings yet
Lec 1
33 pages
Iit Madras Resume Template 2 Page
No ratings yet
Iit Madras Resume Template 2 Page
3 pages
CH 2
No ratings yet
CH 2
37 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
13 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Mining
No ratings yet
Data Mining
25 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Unit 1
No ratings yet
Unit 1
59 pages
Unit I
No ratings yet
Unit I
19 pages
Relational Database Management Systems
No ratings yet
Relational Database Management Systems
8 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
Wao
No ratings yet
Wao
9 pages
Distributed Databases: Course Code:13IT1109 L TPC 4 0 0 3
No ratings yet
Distributed Databases: Course Code:13IT1109 L TPC 4 0 0 3
3 pages
DWM 4
No ratings yet
DWM 4
23 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Article 6
No ratings yet
Article 6
6 pages
Evaluation of Student Academic Performan
No ratings yet
Evaluation of Student Academic Performan
7 pages
Introduction
No ratings yet
Introduction
26 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
D.S Viva Questions-Ktunotes - in
No ratings yet
D.S Viva Questions-Ktunotes - in
7 pages
DM Unit - 3
No ratings yet
DM Unit - 3
10 pages
Resume Sanyam Saxena PDF
No ratings yet
Resume Sanyam Saxena PDF
2 pages
NLP PBL
No ratings yet
NLP PBL
21 pages
DM Unit1 Intro
No ratings yet
DM Unit1 Intro
12 pages
6 TheRealTimeFaceDetectionandRecognitionSystem
No ratings yet
6 TheRealTimeFaceDetectionandRecognitionSystem
48 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
BCA-511 Data Mining & Warehousing - VK BCA
No ratings yet
BCA-511 Data Mining & Warehousing - VK BCA
3 pages
List
No ratings yet
List
2 pages
Knowledge Representation: Facts: Representations of Facts in Some Chosen Formalism
No ratings yet
Knowledge Representation: Facts: Representations of Facts in Some Chosen Formalism
12 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
Chapter 6 - Artificial Intelligence Notes
No ratings yet
Chapter 6 - Artificial Intelligence Notes
13 pages
Ann SVM
No ratings yet
Ann SVM
26 pages
PYTHON Major Project Titles List 100
No ratings yet
PYTHON Major Project Titles List 100
5 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages
DocScanner Feb 17, 2024 1-17 PM
No ratings yet
DocScanner Feb 17, 2024 1-17 PM
14 pages
Chenet MA EEMCS
No ratings yet
Chenet MA EEMCS
57 pages
5th Sem Syllabus
No ratings yet
5th Sem Syllabus
16 pages
Finalchatbot
No ratings yet
Finalchatbot
2 pages
Digital Manufacturing and Assembly Systems in Industry 4.0 1st Edition Kaushik Kumar (Editor)
100% (3)
Digital Manufacturing and Assembly Systems in Industry 4.0 1st Edition Kaushik Kumar (Editor)
52 pages
Aakash Shaw-DWDM2024 PDF
No ratings yet
Aakash Shaw-DWDM2024 PDF
5 pages
Joy Jeet
No ratings yet
Joy Jeet
6 pages
Wilp Data Science Semester Program
No ratings yet
Wilp Data Science Semester Program
2 pages
ADBMS-Paper 1 - Answers
No ratings yet
ADBMS-Paper 1 - Answers
7 pages
CNS Unit 6 by MES WADIA CLG
No ratings yet
CNS Unit 6 by MES WADIA CLG
18 pages
DT20234641424 Application
No ratings yet
DT20234641424 Application
4 pages
Movie Hub Documentation
No ratings yet
Movie Hub Documentation
4 pages
3.1 Purpose
No ratings yet
3.1 Purpose
10 pages
Its132-Sa1 1
No ratings yet
Its132-Sa1 1
3 pages
SOA - Cybersecurity - Monitoring
No ratings yet
SOA - Cybersecurity - Monitoring
4 pages
Er S.C Om: CEGP013091
No ratings yet
Er S.C Om: CEGP013091
1 page
Color Logo - No Background
No ratings yet
Color Logo - No Background
1 page

DMjoy

Uploaded by

DMjoy

Uploaded by

Q.1 ) How Knowledge Discovery is done From Database?

Ans: Knowledge Discovery in Databases (KDD) is a process that

Q.2)What kind of data can be mined?

1. Relational Databases: A database system that includes a set of

In data mining, the central tendency of a dataset is a measure that

Q.4)Data charactersation and Data Discrimination.

Data characterization and data discrimination are two data analysis

Data characterization is the

Example- A data mining query for characterization. Suppose

Data discrimination is the process of comparing the data of the

popular data discretization/discrimination method.

Example -A clustering algorithm can be applied to discretize

Q.5)What do you mean by Supervised and Unsupervised

Ans: Supervised learning -> is basically a synonym for

Unsupervised learning-> is essentially a synonym for

Q6. What is data warehousing?

Ans: A data warehouse integrates data originating from

Q.7) Types of machine learning?

o Classification: Classification algorithms are used to solve problems in

be interested in different kinds of knowledge, making it necessary for

2. Data Security & Privacy3: Ensuring that sensitive information is

 3. Efficiency and scalability of data mining algorithms: In order to

1. Minimum: The smallest number in the dataset2.

 It tells us where the middle value is located, using the median45.

You might also like