0% found this document useful (0 votes)

27 views7 pages

Data Mining UNIT - 1 (Important)

Uploaded by

deepakjami27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views7 pages

Data Mining UNIT - 1 (Important)

Uploaded by

deepakjami27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

💽

Data Mining

Important Questions :

1. What is Data Mining and Data Architecture.

ANS :
Data Mining :
Data mining is the process of extracting or mining the knowledge from large
amount of data.

Knowledge mining from data.

Knowledge extraction

Data Pattern Analysis.

Information Harvesting.

Data Archeology.

Data Dredging.

Data Architecture :
Data architecture refers to the design and structure of data systems, specifying
how data is collected, stored, integrated, and managed across an organization.

2. Explain KDD Process with a neat diagram.

ANS :
KDD ( Knowledge Discovery in Databases) Process :

KDD is a iterative process of converting raw data into useful information

and knowledge through data mining.

Mining can be refined.

Data Mining 1
New data can be integrated and transformed in order to get different and
more appropriate results.

Preprocessing of databases consists of data cleaning and data integration.

The following steps are included in KDD process :

1. Data Cleaning : Removal of noisy and irrelevant data from collection.

Cleaning in case of missing values.

Cleaning noisy data, where noise is random or variance error.

Cleaning with data discrepancy detection and data transformation tools.

2. Data Integration : It is defined as heterogeneous data from multiple sources

combined in a common source ( Data Warehouse ).

Data integration using data migration tools.

Data integration using data synchronization tools.

Data integration using ETL (Extract-Load-Transformation) process.

3. Data Selection : It is defined as the process where data relevant to the

analysis is decided and retrieved from the data collection.

Data Selection is done using :

Neural Network.

Decision Tree.

Naive bayes.

Clustering, Regression, etc.

4. Data Transformation : It is defined as a process of transforming data into

appropriate form for mining by mining procedure.

Data transformation is a two step process :

Data Mapping : Assigning elements from source base to destination

to capture transformation.

Code Generation : Creation of the actual transformation program

5. Data Mining : Clever techniques that are applied to extract patterns.

Transforms task relevant data into patterns.

Decides purpose of model using classification or characterization.

Data Mining 2
6. Pattern Evolution : It is used to identify truly interesting patterns
representing the knowledge using interesting measures.

Find the interestingness of score of each pattern.

Uses summarization and visualization to make data understandable by

the user.

7. Knowledge Representation : It is defined as a technique which utilizes

visualization tools to represent data mining results.

Generate Reports.

Generate Tables.

Generate discriminant rules, classification rules, characterization rules,

etc.

Data Mining 3
3. Explain Data Mining Architecture with a neat
diagram.
ANS :

1. Database : It is a collection of organized data

2. Data Warehouse : A data warehouse is a large, centralized repository that

stores and manages data from multiple sources, designed for query,
reporting, and analysis.

3. WWW (World Wide Web) : This is a system of interconnected documents

and resources that can be accessed through the internet using web
browsers.

4. Data cleaning, integration and selection :

Data cleaning and integration can be performed on data, where

resulting data is stored in data warehouse server.

Sometimes data transformation and consolidation are performed before

the data selection process.

Data Mining 4
5. DWS (Data Warehouse Server) :

Responsible for fetching relevant data based on users mining request

knowledge base.

This is a domain knowledge that is used to search for evaluate

interestingness of a resulting pattern.

Main objective is to make results more accurate and reliable.

6. DMS (Data Mining System) :It consists of set of functions and modules
such as :

Characterization classification.

Cluster analysis.

Prediction.

Evaluation

Deviation Analysis.

7. Pattern Evaluation :

It employees interestingness measures and interacts with DM modules

to search towards interesting patterns for efficient data mining.

It is highly recommended to push the evaluation of as deep as possible

into mining process to get only interesting patterns.

8. User Interface : This helps the user to interact with the data mining system
by data mining query.

4. Explain Data Mining Functionalities with examples.

ANS :

Data mining functionalities refer to the different types of tasks and techniques
used to extract useful patterns, knowledge, and insights from large datasets

Below are the key functionalities:

Multidimensional Concept Description : It includes characterization

(summarizing data) and discrimination (comparing and contrasting data).

Example : dry vs wet regions.

Frequent pattern, association, correlation vs causality :

Data Mining 5
1. Frequent Pattern: Identifies items, sequences, or structures that occur
frequently in a dataset.

Example : In market basket analysis, finding that "milk" and "bread" are
frequently bought together.

2. Association: Discovers rules that describe relationships between items.

Example : If a customer buys bread, they are 80% likely to buy butter.

3. Correlation: Measures how two items are related in terms of their

occurrence together.
Example : Correlating weather conditions with product sales.

4. Causality: Explains cause-and-effect relationships, unlike correlation,

which only measures association.

Example : A rise in ice cream sales causes a rise in ice cream-related

injuries (correlation ≠ causation).

Classification and Prediction :

1. Classification: Assigns items to predefined classes or groups based on

certain attributes.
Example : Classifying customers as high-risk or low-risk based on their
credit history.

2. Prediction: Predicts a future or unknown outcome using historical data,

often in numerical form (e.g., regression).
Example : Predicting a customer’s future spending behavior based on
past transactions.

Cluster Analysis : Groups data objects into clusters or categories based on

their similarity, without predefined labels.

Example : cluster houses to find distribution patterns.

Outliner Analysis :

1. Detects data points that significantly differ from other data points in the
dataset.

2. Outliers often represent anomalies or rare events.

Example : Detecting fraudulent credit card transactions that don’t match

typical spending behavior.

Trend and Evolution Analysis : Studies patterns over time to understand

how data evolves and detects trends, patterns, or regularities.

Data Mining 6
Example : Analyzing stock market data to identify upward or downward
trends over time.

Other pattern-directed or statistical analysis : This includes other

techniques used to discover patterns.

5. Difference between OLAP and OLTP from data

mining to data warehouse system.
ANS :

OLAP (Online Analytical OLTP (Online Transaction

Processing) Processing)

Designed for data analysis and Primarily focused on managing and

decision-making. processing day-to-day transactions.

Utilizes a multidimensional data

Utilizes a relational data model.
model (data cubes).

Handles smaller, real-time

Handles large volumes of data. transaction data that is current and
frequently updated.

Optimized for simple, quick

Supports complex queries.
operations.

Queries are typically complex and Queries are generally

involve aggregations, slicing, dicing, straightforward and involve single-
and drill-down operations. row operations.

Response times can be longer due Response times are very fast due to
to the complexity of the queries. simple queries.

Uses Multi-dimensional model for Uses Normalized data model for

query, reporting, and aggregation. faster database operations.

Example : Business intelligence Example : Banking and e-commerce

tools. systems.

7. Preprocessing Techniques in Data Mining.

ANS :
REFER THE BELOW VIDEO FOR EXPLANATION :

https://youtu.be/us0JWCywAng

Data Mining 7

Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
02 DM BI Data Mining
No ratings yet
02 DM BI Data Mining
66 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Unit 1
No ratings yet
Unit 1
148 pages
Unit 1 - Data Science BCA
No ratings yet
Unit 1 - Data Science BCA
16 pages
Rapid Marts Full
No ratings yet
Rapid Marts Full
1,334 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
20 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Kenan
No ratings yet
Kenan
11 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Wsma Unit-1 Part-1
No ratings yet
Wsma Unit-1 Part-1
14 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
DM 1
No ratings yet
DM 1
47 pages
DWDM Unit - 1-1
No ratings yet
DWDM Unit - 1-1
25 pages
Introduction
No ratings yet
Introduction
46 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
What Is A Business Intelligence Framework
No ratings yet
What Is A Business Intelligence Framework
2 pages
DWDM - Unit - II
No ratings yet
DWDM - Unit - II
55 pages
Data Mining
No ratings yet
Data Mining
46 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Data Mining
No ratings yet
Data Mining
43 pages
Unit 1
No ratings yet
Unit 1
59 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Unit 1 DM
No ratings yet
Unit 1 DM
16 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Data Preprocessing Personal
No ratings yet
Data Preprocessing Personal
11 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
Distributed Database Systems-Chhanda Ray
No ratings yet
Distributed Database Systems-Chhanda Ray
271 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Introduction
No ratings yet
Introduction
27 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Wao
No ratings yet
Wao
9 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Lecture 3 Data Resource Management
No ratings yet
Lecture 3 Data Resource Management
65 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
13 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Unit - I
No ratings yet
Unit - I
22 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Learning-Notes - Books - Designing-Data-Intensive-Applications - MD at Master Keyvanakbary - Learning-Notes
No ratings yet
Learning-Notes - Books - Designing-Data-Intensive-Applications - MD at Master Keyvanakbary - Learning-Notes
91 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
D-Unit-1 R16
No ratings yet
D-Unit-1 R16
17 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
73 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Teradata - Load Utilities
0% (1)
Teradata - Load Utilities
23 pages
Data Cube Computation and Data Generation
No ratings yet
Data Cube Computation and Data Generation
54 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Lecture 6 - Dimensional Modeling
No ratings yet
Lecture 6 - Dimensional Modeling
99 pages
Financial Crime Investigations
100% (1)
Financial Crime Investigations
10 pages
Differences Between Database and Data Warehouse: Database
No ratings yet
Differences Between Database and Data Warehouse: Database
2 pages
DM Unit1 Intro
No ratings yet
DM Unit1 Intro
12 pages
JNTUA MCA V Semester R17 Syllabus
No ratings yet
JNTUA MCA V Semester R17 Syllabus
24 pages
Data Warehouse Design Patterns: Steven F. Lott, Consultant
No ratings yet
Data Warehouse Design Patterns: Steven F. Lott, Consultant
10 pages
Odi Ee Ds
No ratings yet
Odi Ee Ds
5 pages
CV (TCS)
No ratings yet
CV (TCS)
5 pages
EB2406 - Teradata PDF
No ratings yet
EB2406 - Teradata PDF
18 pages
Data Warehousing
No ratings yet
Data Warehousing
7 pages
MIS Tim UNIT 3
No ratings yet
MIS Tim UNIT 3
45 pages
Chapter 6 Foundations of Business Intelligence: Databases and Information Management
No ratings yet
Chapter 6 Foundations of Business Intelligence: Databases and Information Management
22 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
DWH BASICS Interview Questions
No ratings yet
DWH BASICS Interview Questions
30 pages
CDB Aia - Data Integration - Internship - Detailed Handbook
No ratings yet
CDB Aia - Data Integration - Internship - Detailed Handbook
3 pages
ICICI Prudential Life Insurance
No ratings yet
ICICI Prudential Life Insurance
8 pages
Sandip Adhvaryu
No ratings yet
Sandip Adhvaryu
12 pages
Freezing Cross Tab Row Headers
0% (1)
Freezing Cross Tab Row Headers
8 pages
Informatica Resume Dec2011
No ratings yet
Informatica Resume Dec2011
46 pages
Chap 1
No ratings yet
Chap 1
32 pages
Brajesh Patra
No ratings yet
Brajesh Patra
2 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet