0% found this document useful (0 votes)

27 views24 pages

L-1 Data Mining Issues

Cd 503

Uploaded by

omh4453

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views24 pages

L-1 Data Mining Issues

Cd 503

Uploaded by

omh4453

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Program: B.Tech.

, CD,5th Sem , 3rd year

CD 503:Data mining& Warehousing

Unit 2
Topic: - Mining Issues

July-Dec 2024
Lecture 1
D D Shrivastava
Assistant Professor
Institute of Technology & Management ,IT
Content
• Prerequisite of topic

• Data Mining Issues

• Mining methodology and user interaction issues

• Performance issues

• Issues relating to the diversity of database types

• Learning Outcomes

• References

CD 503 01
Prerequisite of topic
• The students should clear the definition of Data Mining and Data warehousing.

• The students should have a general idea about Data Base Concept and Data
Model.

• The students should have general idea about data types.

• The students should have a general idea about Classification and Prediction

IT 702 02
Data Mining Issues
Data mining systems face a lot of challenges and issues:-
1. Mining methodology and user interaction issues
2. Performance issues
3. Issues relating to the diversity of database types

CD 503 <SELO: 1,9> Reference –R1,R5 03

Mining methodology and user interaction issues
1.Mining methodology and user-interaction issues.:-These kinds of knowledge
mined, the ability to mine knowledge at the use of domain knowledge, ad-hoc
mining, and knowledge visualization.

IT 702 <SELO: 1,9> Reference –R1,R5 04

Mining methodology and user interaction issues
1. Mining different kinds of knowledge in databases.
2. Interactive mining of knowledge at multiple levels of abstraction.
3. Incorporation of background knowledge.
4. Data mining query languages and ad-hoc data mining
5. Presentation and visualization of data mining results
6. Handling outlier or incomplete data
7. Pattern Evaluation

CD 503 <SELO: 1,9> Reference –R1,R5 05

1.Mining different kinds of knowledge in databases
Since different users can be interested in different kinds of knowledge, data
mining should cover a wide spectrum of data analysis and knowledge discovery
tasks, including data characterization, discrimination, association,
classification, clustering, trend analysis, and similarity analysis.

IT 702 <SELO: 1,9> Reference –R1,R5 06

2.Interactive mining of knowledge at multiple levels of
abstraction
For databases containing a huge amount of data, appropriate sampling technique
can first be applied to facilitate interactive data exploration.

Interactive mining allows users to focus the search for patterns, providing and
refining data mining requests based on returned results.

CD 503 <SELO: 1,9> Reference –R1,R5 07

2. Interactive mining of knowledge at multiple levels of
abstraction
Specifically knowledge should be mined by drilling-down, rolling-up through the
data space and knowledge space interactively, similar to what OLAP(Online
analytical processing) can do on data cubes. In this way, the user can interact with
the data mining system to view data and discovered patterns from different
angles.

CD 503 <SELO: 1,9> Reference –R1,R5 08

3. Incorporation of background knowledge
Background knowledge or information regarding the domain under study may be
used to guide the discovery process and allow discovered patterns to be expressed
at different levels of abstraction.

Domain knowledge related to databases such as integrity constraints and

deduction rules can help focus and speed up a data mining process or judge the
discovered patterns.

IT 702 <SELO: 1,9> Reference –R1,R5 09

4. Data mining query languages and ad-hoc data mining
High-level data mining query languages need to be developed to allow users to
describe ad-hoc data mining tasks like relevant sets of data for analysis, the
domain knowledge, the kinds of knowledge to be mined, and the conditions and
constraints on the discovered patterns.

CD 503 <SELO: 1,9> Reference –R1,R5 10

5.Presentation and visualization of data mining results
Discovered knowledge should be expressed in high-level languages, visual
representations, or other expressive forms so that the knowledge can be easily
understood and directly usable by humans. This is especially crucial if the data
mining system is to be interactive.

CD 503 <SELO: 1,9> Reference –R1,R5 11

6.Handling outlier or incomplete data
The data stored in a database may be Outlier ,exceptional cases, or incomplete
data objects. These objects may confuse the analysis process, causing over timing
of the data to the knowledge model constructed. As a result, the accuracy of the
discovered patterns can be poor.

CD 503 <SELO: 1,9> Reference –R1,R5 12

6. Handling outlier or incomplete data
Data cleaning methods and data analysis methods which can handle outliers are
required. While most methods discard outlier data, such data may be of interest in
itself such as in fraud detection for Finding unusual usage of telecommunication
services or credit cards. This form of data analysis is known as outlier mining.

IT 702 <SELO: 1,9> Reference –R1,R5 13

7. Pattern Evaluation

A data mining system can uncover thousands of patterns. Many of the patterns
discovered may be uninteresting to the given user, representing common knowledge
Several challenges remain regarding the development of techniques to assess the
interestingness of discovered patterns which estimate the value of patterns with
respect to a given user class.

CD 503 <SELO: 1,9> Reference –R1,R5 14

2. Performance Issues
These include efficiency, scalability, and parallelization of data mining algorithms.
1.Efficiency and scalability of data mining algorithms:-
To effectively extract information from a huge amount of data in databases, data
mining algorithms must be efficient and scalable. That is, the running time of a data
mining algorithm must be predictable and acceptable in large databases. The results
from the partitions are then merged.

CD 503 <SELO: 1,9> Reference –R1,R5 15

2.Parallel, distributed and incremental updating algorithms

The huge size of many databases, the wide distribution of data, and the
computational complexity of some data mining methods are factors motivating
the development of parallel and distributed data mining algorithms. Such
algorithms divide the data into partitions, which are processed in parallel.

CD 503 <SELO: 1,9> Reference –R1,R5 16

2.Parallel, distributed, and incremental updating algorithms

the high cost of some data mining processes promotes the need for incremental
data mining algorithms which incorporate database updates without having to
mine the entire data again . Such algorithms perform knowledge modification
incrementally to strengthen what was previously discovered.

IT 603 <SELO: 1,9> Reference –R1,R5 17

3. Issues relating to the diversity of database types

1.Handling of relational and complex types of data:- There are many kinds
of data stored in databases and data warehouses. Since relational databases and
data warehouses are widely used the development of efficient and effective
data mining systems for such data is important.

CD 503 <SELO: 1,9> Reference –R1,R5 18

1.Handling of relational and complex types of data

However, other databases may contain complex data objects, hypertext and
multimedia data, spatial data, temporal data, or transaction data. It is unrealistic
to expect one system to mine all kinds of data due to the diversity of data types
and different goals of data mining. Therefore, one may expect to have different
data mining systems for different kinds of data.

CD 503 <SELO: 1,9> Reference –R1,R5 19

2.Mining information from heterogeneous D.B & global info.
systems
Local and wide-area computer networks (such as the Internet) connect many
sources of data, distributed, and heterogeneous databases. The discovery of
knowledge from sources of structured, semi-structured, or unstructured data
with diverse data semantics poses great challenges to data mining.

CD 503 <SELO: 1,9> Reference –R1,R5 10

Student Effective Learning Outcome
1. Learn about the Data Mining Issues

2. Learn about the Mining methodology and user interaction issues

3. Learn about the Performance issues

4. Learn about the Issues relating to the diversity of database types

CD 503 21
References
1. R1:E-BOOK -A Hand and M. Kamber, “Data Mining Concept and
Technique”, Morgan.Kauffmann Publishers, Else river India, New Delhi,
2003.

2. R2:Berson “Dataware housing, Data Mining & DLAP, TMH.

3. R3:W.H. Inmon “ Building the Datawarehouse, 3ed, Wiley India.

4. R4: Anahory, “Data Warehousing in Real World”, Pearson Education.

5. R5: http://www.brainkart.com/article/Major-Issues-in-Data-Mining

CD 503 22
CD 503 23

Major Issues in DM
No ratings yet
Major Issues in DM
5 pages
Data Mining Issues
No ratings yet
Data Mining Issues
5 pages
DM&DW SEE Module 1
No ratings yet
DM&DW SEE Module 1
6 pages
Data Mining
No ratings yet
Data Mining
22 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
No ratings yet
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
5 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
Data Mining and Warehouse
No ratings yet
Data Mining and Warehouse
30 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
2 pages
Advanced Databases and Mining Unit 4
No ratings yet
Advanced Databases and Mining Unit 4
10 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Data Mining
No ratings yet
Data Mining
26 pages
Issues in Data Mining
No ratings yet
Issues in Data Mining
4 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
Whats App
No ratings yet
Whats App
23 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Unit 1
No ratings yet
Unit 1
11 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
3 pages
Lecture 4 - 6
No ratings yet
Lecture 4 - 6
18 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Laq 1
No ratings yet
Laq 1
2 pages
Data Mining
No ratings yet
Data Mining
3 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Unit III
No ratings yet
Unit III
101 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Assgg
No ratings yet
Assgg
12 pages
WINSEM2024-25 MCSE615L TH VL2024250502897 2024-12-19 Reference-Material-I
No ratings yet
WINSEM2024-25 MCSE615L TH VL2024250502897 2024-12-19 Reference-Material-I
58 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
Data Mining Syllabus and Question
No ratings yet
Data Mining Syllabus and Question
6 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
20 pages
Unit-1 Notes Onl
No ratings yet
Unit-1 Notes Onl
25 pages
Chapter 1 - What Is Data Mining
No ratings yet
Chapter 1 - What Is Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
15 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
47 pages
Answers PDF
No ratings yet
Answers PDF
9 pages
Unit 1..
No ratings yet
Unit 1..
27 pages
Down 2
No ratings yet
Down 2
61 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Module 4
No ratings yet
Module 4
54 pages
Topic 4 - Data Mining Tools and Technique
No ratings yet
Topic 4 - Data Mining Tools and Technique
22 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Module-2-Data Mining
No ratings yet
Module-2-Data Mining
48 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
The Study of Building the Data Warehouse
From Everand
The Study of Building the Data Warehouse
venkateswara Rao
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Lect 17
No ratings yet
Lect 17
23 pages
66d2daa91f381 Ha99
No ratings yet
66d2daa91f381 Ha99
1 page
Authoring Tools
No ratings yet
Authoring Tools
31 pages
Animation
No ratings yet
Animation
25 pages
Poly
No ratings yet
Poly
25 pages
DocScanner 05-Apr-2024 11-58 Am
No ratings yet
DocScanner 05-Apr-2024 11-58 Am
10 pages
Al CD 402 Analysis and Design of Algorithm Jun 2022
No ratings yet
Al CD 402 Analysis and Design of Algorithm Jun 2022
2 pages
Omega OM-CP-RTDTEMP2000
No ratings yet
Omega OM-CP-RTDTEMP2000
5 pages
Autonics KRN1000 Manual
No ratings yet
Autonics KRN1000 Manual
2 pages
IO Link TBEN L5 8IOL
No ratings yet
IO Link TBEN L5 8IOL
104 pages
SQL 2005/2008 DBA (Database Administrator) : Kebutuhan: 1 Orang
No ratings yet
SQL 2005/2008 DBA (Database Administrator) : Kebutuhan: 1 Orang
4 pages
Agenda: RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features
No ratings yet
Agenda: RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features
34 pages
Computer Architecture CS F342 Ca-Lect7
No ratings yet
Computer Architecture CS F342 Ca-Lect7
11 pages
Workshop01 - Answer
No ratings yet
Workshop01 - Answer
8 pages
EIS & SM - RTP - Nov 2021 - CA Inter (New)
No ratings yet
EIS & SM - RTP - Nov 2021 - CA Inter (New)
26 pages
Asymmetric Dekker Synchronization
No ratings yet
Asymmetric Dekker Synchronization
52 pages
Open Innovation Researching A New Paradigm
No ratings yet
Open Innovation Researching A New Paradigm
14 pages
Ciphering Procedure in GSM Call Flow
No ratings yet
Ciphering Procedure in GSM Call Flow
3 pages
Data Warehouse Management Systems
No ratings yet
Data Warehouse Management Systems
56 pages
NodeJS Cheat Sheet - OverAPI
No ratings yet
NodeJS Cheat Sheet - OverAPI
3 pages
Automatically Collect Multiple AWR Reports: # Step1: Type The Above DB Id Here
No ratings yet
Automatically Collect Multiple AWR Reports: # Step1: Type The Above DB Id Here
3 pages
FINAL Document Kalyani
No ratings yet
FINAL Document Kalyani
80 pages
6 Modal Frequency Response Analysis
No ratings yet
6 Modal Frequency Response Analysis
26 pages
Getting Started nRF5SDK Ses
No ratings yet
Getting Started nRF5SDK Ses
39 pages
LiFi Project Report
No ratings yet
LiFi Project Report
49 pages
AXE Telephone Exchange - Wikipedia, The Free Encyclopedia
No ratings yet
AXE Telephone Exchange - Wikipedia, The Free Encyclopedia
2 pages
Word Processing Teachers Note Section 2 Part I
No ratings yet
Word Processing Teachers Note Section 2 Part I
30 pages
Chapter 2: PC Assembly: Instructor Materials
100% (1)
Chapter 2: PC Assembly: Instructor Materials
46 pages
RSA
No ratings yet
RSA
25 pages
Transport Layer Numerical
No ratings yet
Transport Layer Numerical
29 pages
Mobile App Dev T Google Maps
No ratings yet
Mobile App Dev T Google Maps
39 pages
pCO Sistema - EN - Ver - 1.08
No ratings yet
pCO Sistema - EN - Ver - 1.08
6 pages
Bursting Reports in Cognos BI With Version 10 & 11 - Lodestar Solutions
No ratings yet
Bursting Reports in Cognos BI With Version 10 & 11 - Lodestar Solutions
9 pages
Employee Schedule1
No ratings yet
Employee Schedule1
4 pages
How To Convert Numbers To Year - Month - Day or Date in Excel
No ratings yet
How To Convert Numbers To Year - Month - Day or Date in Excel
7 pages
Aptio 4.x Status Codes: Checkpoints & Beep Codes For Debugging
No ratings yet
Aptio 4.x Status Codes: Checkpoints & Beep Codes For Debugging
12 pages

L-1 Data Mining Issues

Uploaded by

L-1 Data Mining Issues

Uploaded by

Program: B.Tech.

, CD,5th Sem , 3rd year

CD 503:Data mining& Warehousing

• Data Mining Issues

• Mining methodology and user interaction issues

• Issues relating to the diversity of database types

• The students should have general idea about data types.

CD 503 <SELO: 1,9> Reference –R1,R5 03

IT 702 <SELO: 1,9> Reference –R1,R5 04

CD 503 <SELO: 1,9> Reference –R1,R5 05

IT 702 <SELO: 1,9> Reference –R1,R5 06

CD 503 <SELO: 1,9> Reference –R1,R5 07

CD 503 <SELO: 1,9> Reference –R1,R5 08

Domain knowledge related to databases such as integrity constraints and

IT 702 <SELO: 1,9> Reference –R1,R5 09

CD 503 <SELO: 1,9> Reference –R1,R5 10

CD 503 <SELO: 1,9> Reference –R1,R5 11

CD 503 <SELO: 1,9> Reference –R1,R5 12

IT 702 <SELO: 1,9> Reference –R1,R5 13

CD 503 <SELO: 1,9> Reference –R1,R5 14

CD 503 <SELO: 1,9> Reference –R1,R5 15

CD 503 <SELO: 1,9> Reference –R1,R5 16

IT 603 <SELO: 1,9> Reference –R1,R5 17

CD 503 <SELO: 1,9> Reference –R1,R5 18

CD 503 <SELO: 1,9> Reference –R1,R5 19

CD 503 <SELO: 1,9> Reference –R1,R5 10

2. Learn about the Mining methodology and user interaction issues

3. Learn about the Performance issues

4. Learn about the Issues relating to the diversity of database types

2. R2:Berson “Dataware housing, Data Mining & DLAP, TMH.

3. R3:W.H. Inmon “ Building the Datawarehouse, 3ed, Wiley India.

4. R4: Anahory, “Data Warehousing in Real World”, Pearson Education.

You might also like