0% found this document useful (0 votes)
27 views24 pages

L-1 Data Mining Issues

Cd 503

Uploaded by

omh4453
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views24 pages

L-1 Data Mining Issues

Cd 503

Uploaded by

omh4453
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Program: B.Tech.

, CD,5th Sem , 3rd year

CD 503:Data mining& Warehousing

Unit 2
Topic: - Mining Issues

July-Dec 2024
Lecture 1
D D Shrivastava
Assistant Professor
Institute of Technology & Management ,IT
Content
• Prerequisite of topic

• Data Mining Issues

• Mining methodology and user interaction issues

• Performance issues

• Issues relating to the diversity of database types

• Learning Outcomes

• References

CD 503 01
Prerequisite of topic
• The students should clear the definition of Data Mining and Data warehousing.

• The students should have a general idea about Data Base Concept and Data
Model.

• The students should have general idea about data types.

• The students should have a general idea about Classification and Prediction

IT 702 02
Data Mining Issues
Data mining systems face a lot of challenges and issues:-
1. Mining methodology and user interaction issues
2. Performance issues
3. Issues relating to the diversity of database types

CD 503 <SELO: 1,9> Reference –R1,R5 03


Mining methodology and user interaction issues
1.Mining methodology and user-interaction issues.:-These kinds of knowledge
mined, the ability to mine knowledge at the use of domain knowledge, ad-hoc
mining, and knowledge visualization.

IT 702 <SELO: 1,9> Reference –R1,R5 04


Mining methodology and user interaction issues
1. Mining different kinds of knowledge in databases.
2. Interactive mining of knowledge at multiple levels of abstraction.
3. Incorporation of background knowledge.
4. Data mining query languages and ad-hoc data mining
5. Presentation and visualization of data mining results
6. Handling outlier or incomplete data
7. Pattern Evaluation

CD 503 <SELO: 1,9> Reference –R1,R5 05


1.Mining different kinds of knowledge in databases
Since different users can be interested in different kinds of knowledge, data
mining should cover a wide spectrum of data analysis and knowledge discovery
tasks, including data characterization, discrimination, association,
classification, clustering, trend analysis, and similarity analysis.

IT 702 <SELO: 1,9> Reference –R1,R5 06


2.Interactive mining of knowledge at multiple levels of
abstraction
For databases containing a huge amount of data, appropriate sampling technique
can first be applied to facilitate interactive data exploration.

Interactive mining allows users to focus the search for patterns, providing and
refining data mining requests based on returned results.

CD 503 <SELO: 1,9> Reference –R1,R5 07


2. Interactive mining of knowledge at multiple levels of
abstraction
Specifically knowledge should be mined by drilling-down, rolling-up through the
data space and knowledge space interactively, similar to what OLAP(Online
analytical processing) can do on data cubes. In this way, the user can interact with
the data mining system to view data and discovered patterns from different
angles.

CD 503 <SELO: 1,9> Reference –R1,R5 08


3. Incorporation of background knowledge
Background knowledge or information regarding the domain under study may be
used to guide the discovery process and allow discovered patterns to be expressed
at different levels of abstraction.

Domain knowledge related to databases such as integrity constraints and


deduction rules can help focus and speed up a data mining process or judge the
discovered patterns.

IT 702 <SELO: 1,9> Reference –R1,R5 09


4. Data mining query languages and ad-hoc data mining
High-level data mining query languages need to be developed to allow users to
describe ad-hoc data mining tasks like relevant sets of data for analysis, the
domain knowledge, the kinds of knowledge to be mined, and the conditions and
constraints on the discovered patterns.

CD 503 <SELO: 1,9> Reference –R1,R5 10


5.Presentation and visualization of data mining results
Discovered knowledge should be expressed in high-level languages, visual
representations, or other expressive forms so that the knowledge can be easily
understood and directly usable by humans. This is especially crucial if the data
mining system is to be interactive.

CD 503 <SELO: 1,9> Reference –R1,R5 11


6.Handling outlier or incomplete data
The data stored in a database may be Outlier ,exceptional cases, or incomplete
data objects. These objects may confuse the analysis process, causing over timing
of the data to the knowledge model constructed. As a result, the accuracy of the
discovered patterns can be poor.

CD 503 <SELO: 1,9> Reference –R1,R5 12


6. Handling outlier or incomplete data
Data cleaning methods and data analysis methods which can handle outliers are
required. While most methods discard outlier data, such data may be of interest in
itself such as in fraud detection for Finding unusual usage of telecommunication
services or credit cards. This form of data analysis is known as outlier mining.

IT 702 <SELO: 1,9> Reference –R1,R5 13


7. Pattern Evaluation

A data mining system can uncover thousands of patterns. Many of the patterns
discovered may be uninteresting to the given user, representing common knowledge
Several challenges remain regarding the development of techniques to assess the
interestingness of discovered patterns which estimate the value of patterns with
respect to a given user class.

CD 503 <SELO: 1,9> Reference –R1,R5 14


2. Performance Issues
These include efficiency, scalability, and parallelization of data mining algorithms.
1.Efficiency and scalability of data mining algorithms:-
To effectively extract information from a huge amount of data in databases, data
mining algorithms must be efficient and scalable. That is, the running time of a data
mining algorithm must be predictable and acceptable in large databases. The results
from the partitions are then merged.

CD 503 <SELO: 1,9> Reference –R1,R5 15


2.Parallel, distributed and incremental updating algorithms

The huge size of many databases, the wide distribution of data, and the
computational complexity of some data mining methods are factors motivating
the development of parallel and distributed data mining algorithms. Such
algorithms divide the data into partitions, which are processed in parallel.

CD 503 <SELO: 1,9> Reference –R1,R5 16


2.Parallel, distributed, and incremental updating algorithms

the high cost of some data mining processes promotes the need for incremental
data mining algorithms which incorporate database updates without having to
mine the entire data again . Such algorithms perform knowledge modification
incrementally to strengthen what was previously discovered.

IT 603 <SELO: 1,9> Reference –R1,R5 17


3. Issues relating to the diversity of database types

1.Handling of relational and complex types of data:- There are many kinds
of data stored in databases and data warehouses. Since relational databases and
data warehouses are widely used the development of efficient and effective
data mining systems for such data is important.

CD 503 <SELO: 1,9> Reference –R1,R5 18


1.Handling of relational and complex types of data

However, other databases may contain complex data objects, hypertext and
multimedia data, spatial data, temporal data, or transaction data. It is unrealistic
to expect one system to mine all kinds of data due to the diversity of data types
and different goals of data mining. Therefore, one may expect to have different
data mining systems for different kinds of data.

CD 503 <SELO: 1,9> Reference –R1,R5 19


2.Mining information from heterogeneous D.B & global info.
systems
Local and wide-area computer networks (such as the Internet) connect many
sources of data, distributed, and heterogeneous databases. The discovery of
knowledge from sources of structured, semi-structured, or unstructured data
with diverse data semantics poses great challenges to data mining.

CD 503 <SELO: 1,9> Reference –R1,R5 10


Student Effective Learning Outcome
1. Learn about the Data Mining Issues

2. Learn about the Mining methodology and user interaction issues

3. Learn about the Performance issues

4. Learn about the Issues relating to the diversity of database types

CD 503 21
References
1. R1:E-BOOK -A Hand and M. Kamber, “Data Mining Concept and
Technique”, Morgan.Kauffmann Publishers, Else river India, New Delhi,
2003.

2. R2:Berson “Dataware housing, Data Mining & DLAP, TMH.

3. R3:W.H. Inmon “ Building the Datawarehouse, 3ed, Wiley India.

4. R4: Anahory, “Data Warehousing in Real World”, Pearson Education.

5. R5: http://www.brainkart.com/article/Major-Issues-in-Data-Mining

CD 503 22
CD 503 23

You might also like