Data Mining L-3,4

The document discusses various types of data repositories applicable for data mining, including relational databases, data warehouses, and advanced database systems. It outlines the characteristics and structures of these databases, as well as the integration of data mining systems with database systems. Additionally, it addresses major issues in data mining, such as performance, user interaction, and the diversity of data types.

Uploaded by

xataje8102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views25 pages

Data Mining L-3,4

Uploaded by

xataje8102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Rishi Sharma

IIIT Surat
Data for Data Mining
❖ Data mining should be applicable to any kind of data repository, as well as to
transient data, such as data streams.
➢ Relational databases,
➢ data warehouses,
➢ transactional databases,
➢ advanced database systems,
➢ flat files,
➢ data streams, and
➢ World Wide Web.
❖ Advanced database systems include object-relational databases and
specific application-oriented databases: spatial databases, time-series
databases, text databases, and multimedia databases.
Relational Databases
❖ A relational database is a collection of tables, each of which is assigned a
unique name.
❖ Each table consists of a set of attributes and usually stores a large set of
tuples .
❖ Each tuple in a relational table represents an object identified by a unique key
and described by a set of attribute values.
❖ A semantic data model, such as an entity-relationship (ER) data model, is
often constructed for relational databases.
Data Warehouses
❖ A data warehouse is a repository of information collected from multiple sources,
stored under a unified schema, and that usually resides at a single site.
❖ Data warehouses are constructed via a process of data cleaning, data
integration, data transformation, data loading, and periodic data refreshing.
❖ A data warehouse is usually modeled by a multidimensional database structure,
where each dimension corresponds to an attribute or a set of attributes in the
schema, and each cell stores the value of some aggregate measure.
❖ The actual physical structure of a data warehouse may be a relational data
store or a multidimensional data cube.
Fig: Framework of a data warehouse
Transactional Databases
❖ Transactional database consists of a file where each record represents a
transaction. A transaction typically includes a unique transaction identity
number (trans ID) and a list of the items making up the transaction.
❖ The transactional database may have additional tables date of the
transaction, the customer ID number, the ID number of the salesperson and
of the branch.
Advanced Data and Information Systems and Advanced Applications

❖ The new database applications include handling:

➢ spatial data (such as maps),
➢ engineering design data (design of buildings, system components, or integrated circuits),
➢ hypertext and multimedia data (including text, image, video, and audio data),
➢ time-related data (such as historical records or stock exchange data),
➢ stream data (such as video surveillance and sensor data, streams), and
➢ World Wide Web
Object-Relational Databases
❖ Object-relational databases are constructed based on an object-relational
data model.
❖ This model extends the relational model by providing a rich data type for
handling complex objects and object orientation.
❖ Applications need to handle complex objects and structures, object-relational
databases are becoming increasingly popular in industry and applications.
Temporal Databases, Sequence Databases, and
Time-Series Databases

❖ A temporal database stores relational data that include time-related attributes.

These attributes may involve several timestamps, each having different
semantics.
❖ A sequence database stores sequences of ordered events, with or without a
concrete notion of time.
❖ A time-series database stores sequences of values or events obtained over
repeated measurements of time.
❖ Data mining techniques can be used to find the characteristics of object
evolution, or the trend of changes for objects in the database. Such information
can be useful in decision making and strategy planning.
❖ E.g: Banking data, Stock data, Traffic information
Spatial Databases and Spatiotemporal Databases
❖ Spatial databases contain spatial-related information.
➢ Eg: geographic (map) databases,
➢ very large-scale integration (VLSI) or computed-aided design databases, and
➢ medical and satellite image databases.
❖ Spatial data may be represented in raster format, consisting of n-dimensional
bit maps or pixel maps.
❖ A spatial database that stores spatial objects that change with time is called
spatiotemporal database.
➢ Eg: group the trends of moving objects and identify strangely moving vehicles,
➢ distinguish a bioterrorist attack from a normal outbreak of the flu based on the geographic
spread of a disease with time
Text Databases and Multimedia Databases
❖ Text databases are databases that contain word descriptions for objects.
➢ E.g: Word descriptions are usually not simple keywords but rather long sentences or
paragraphs, such as product specifications, error or bug reports, warning messages,
summary reports, notes, or other documents.
❖ Multimedia databases store image, audio, and video data.
➢ E.g: picture content-based retrieval, voice-mail systems, video-on-demand systems, the
World Wide Web, and speech-based user interfaces that recognize spoken commands.
Heterogeneous Databases and Legacy Databases
❖ A heterogeneous database consists of a set of interconnected, autonomous
component databases. The components communicate in order to exchange
information and answer queries.
❖ A legacy database is a group of heterogeneous databases that combines
different kinds of data systems, such as relational or object-oriented
databases, hierarchical databases, network databases, spreadsheets,
multimedia databases, or file systems.
❖ The heterogeneous databases in a legacy database may be connected by
intra or inter-computer networks.
Data Streams
❖ Many applications involve the generation and analysis of a new kind of data,
called stream data, where data flow in and out of an observation platform
❖ Data streams have the following unique features: huge or possibly infinite
volume, dynamically changing, flowing in and out in a fixed order, allowing
only one or a small number of scans, and demanding fast response time.
❖ Mining data streams involves the efficient discovery of general patterns and
dynamic changes within stream data.
The World Wide Web
❖ The World Wide Web and its associated distributed information services, such
as Yahoo!, Google, America Online, and AltaVista, provide rich, worldwide,
on-line information services, where data objects are linked together to
facilitate interactive access.
❖ Capturing user access patterns in such distributed information environments
is called Web usage mining (or Weblog mining).
❖ Automated Web page clustering and classification help group and arrange
Web pages in a multidimensional manner based on their contents. Web
community analysis helps identify hidden Web social networks and
communities and observe their evolution.
Data Mining Task Primitives
A data mining task is represented in the form of a data mining query is defined in
data mining task primitives.
❖ The set of task-relevant data to be mined
❖ The kind of knowledge to be mine
❖ The background knowledge to be used in the discovery process
❖ The interestingness measures and thresholds for pattern evaluation
❖ The expected representation for visualizing the discovered patterns
Task-relevant data Knowledge type to be mined Background knowledge Pattern interestingness measures Visualization of discovered
Database or data warehouse Characterization Concept hierarchies Simplicity patterns
name Discrimination User beliefs about relationships Certainty (e.g., confidence) Rules, tables, reports, charts,
Database tables or data Association/correlation in the data Utility (e.g., support) graphs, decision trees,
warehouse cubes Classification/prediction Novelty and cubes
Conditions for data selection Clustering Drill-down and roll-up
Relevant attributes or dimensions
Data grouping criteria
Integration of a Data Mining System with a Database or Data
Warehouse System

DM system works in an environment that requires it to communicate with other

information system components, such as Database and Datawarehouse systems
integration schemes include:
❖ No coupling,
❖ Loose coupling,
❖ Semi-tight coupling, and
❖ Tight coupling
No Coupling

❖ Data mining system will not use any function, i.e. no communication with database.
It communicate with other storage methods/file system.
❖ Drawback:
➢ DB system provides a great deal of flexibility and efficiency at storing, organizing, accessing, and
processing data. Without using a DB/DW system, a DM system may spend a substantial amount of
time finding, collecting, cleaning, and transforming data.
➢ DM system will need to use other tools to extract data, making it difficult to integrate such a system
into an information processing environment. Thus, no coupling represents a poor design
Loose Cupling

❖ DM system will use some facilities of a DB or DW system, fetching data from

a data repository managed by these systems, performing data mining, and
then storing the mining results either in a file or database or data warehouse.
❖ Loose coupling is better than no coupling because it can fetch any portion of
data stored in databases or data warehouses by using query processing,
indexing, and other system facilities.
❖ Advantage: Flexibility, efficiency, and fast due to store in main memory.
❖ Drawback: High scalability and good performance with large data sets
Semitight coupling
❖ Linking with DM system to a DB/DW system, efficient implementations of a
few essential data mining primItives can be provided in the DB/DW system.
❖ These primitives can include sorting, indexing, aggregation, histogram
analysis, multiway join, and precomputation of some essential statistical
measures, such as sum, count, max, min, standard deviation.
❖ Mining results can be pre computed and stored in the DB/DW system.
Because these intermediate mining results are either precomputed or can be
computed efficiently, this design will enhance the performance of a DM
system.
Tight coupling
❖ DM system is smoothly integrated into the DB/DW system. The data mining
subsystem is treated as one functional component of an information system.
❖ Data mining queries and functions are optimized based on mining query
analysis, data structures, indexing schemes, and query processing methods
of a DB or DW system.
❖ DM, DB, and DW systems will evolve and integrate together as one
information system with multiple functionalities. This will provide a uniform
information processing environment.
❖ It facilitates efficient implementations of data mining functions, high system
performance, and an integrated information processing environment.
Major Issues in Data Mining
Major issues in data mining regarding mining methodology are:
❖ User interaction,
❖ Performance, and
❖ Diverse data types
Mining methodology and user interaction issues

❖ Mining different kinds of knowledge in databases

❖ Interactive mining of knowledge at multiple levels of abstraction
❖ Incorporation of background knowledge
❖ Data mining query languages and ad hoc data mining
❖ Presentation and visualization of data mining results
❖ Handling noisy or incomplete data
❖ Pattern evaluation
Performance issues
❖ Efficiency and scalability of data mining algorithms:
➢ To effectively extract information from a huge amount of data in databases, data mining algorithms
must be efficient and scalable.
➢ From a database perspective on knowledge discovery, efficiency and scalability are key issues in the
implementation of data mining systems.
❖ Parallel, distributed, and incremental mining algorithms:
➢ The huge size of many databases, the wide distribution of data, and the computational complexity of
some data mining methods are factors motivating the development of parallel and distributed data
mining algorithms.
➢ Algorithms divide the data into partitions, which are processed in parallel. The results from the
partitions are then merged.
➢ The high cost of some data mining processes promotes the need for incremental data mining
algorithms that incorporate database updates without having to mine the entire data again “from
scratch.”
➢ Such algorithms perform knowledge modification incrementally to amend and strengthen what was
previously discovered.
Issues relating to the diversity of database types
❖ Handling of relational and complex types of data:
➢ Databases may contain complex data objects, hypertext and multimedia data, spatial data,
temporal data, or transaction data.
➢ It is unrealistic to expect one system to mine all kinds of data, given the diversity of data types and
different goals of data mining.
❖ Mining information from heterogeneous databases and global information
systems:
➢ Local- and wide-area computer networks (such as the Internet) connect many sources of data,
forming huge, distributed, and heterogeneous databases.
➢ The discovery of knowledge from different sources of structured, semistructured, or unstructured
data with diverse data semantics poses great challenges to data Mining.
➢ Web mining, which uncovers interesting knowledge about Web contents, Web structures, Web
usage, and Web dynamics, becomes a very challenging and fast-evolving field in data mining.

Business Plan
95% (40)
Business Plan
24 pages
On What Kind of Data Mining Task Can Be Performed? or Explain Different Data Repository On Which Data Mining Task Can Be Performed
No ratings yet
On What Kind of Data Mining Task Can Be Performed? or Explain Different Data Repository On Which Data Mining Task Can Be Performed
5 pages
DWDM Module II
No ratings yet
DWDM Module II
103 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
Data Mining
No ratings yet
Data Mining
84 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
12 pages
DM Unit2 (Part1)
No ratings yet
DM Unit2 (Part1)
19 pages
Data Mining Moodle Notes U1
No ratings yet
Data Mining Moodle Notes U1
11 pages
DWDM PDF (R18) (2018-2022)
No ratings yet
DWDM PDF (R18) (2018-2022)
243 pages
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
No ratings yet
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
84 pages
Mining Kind of Data
No ratings yet
Mining Kind of Data
24 pages
Major Components of Data Mining System
No ratings yet
Major Components of Data Mining System
9 pages
Session 9
No ratings yet
Session 9
12 pages
DM UNIT-I Notes
No ratings yet
DM UNIT-I Notes
54 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Data Warehousing and Mining Min 1 53
No ratings yet
Data Warehousing and Mining Min 1 53
53 pages
DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
CH - 1 Relational Database Design Updated
No ratings yet
CH - 1 Relational Database Design Updated
80 pages
Unit 1f
No ratings yet
Unit 1f
50 pages
DM - UNIT I
No ratings yet
DM - UNIT I
58 pages
Data Warehouse and Data Mining: Lecture Notes
No ratings yet
Data Warehouse and Data Mining: Lecture Notes
69 pages
Data Mining MCA 3 Sem
No ratings yet
Data Mining MCA 3 Sem
51 pages
Unit 1 DM
No ratings yet
Unit 1 DM
37 pages
DBMS, Data Warehousing and Data Mining
No ratings yet
DBMS, Data Warehousing and Data Mining
31 pages
DWDM - (UNIT-1) : SVIT College of Engineering, ATP
No ratings yet
DWDM - (UNIT-1) : SVIT College of Engineering, ATP
40 pages
DWDM
No ratings yet
DWDM
48 pages
What Motivated Data Mining?: Huge Amount of Raw DATA Is Available - The Motivation For The Data Mining Is To
No ratings yet
What Motivated Data Mining?: Huge Amount of Raw DATA Is Available - The Motivation For The Data Mining Is To
83 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Ch-1 Introduction and Overview
No ratings yet
Ch-1 Introduction and Overview
22 pages
DM Unit-I
No ratings yet
DM Unit-I
54 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
DWM Unit 4 Introduction To Data Mining
100% (2)
DWM Unit 4 Introduction To Data Mining
17 pages
1intro - Data Mining
No ratings yet
1intro - Data Mining
61 pages
DWDM All Units
No ratings yet
DWDM All Units
102 pages
Data Mining Ch1
No ratings yet
Data Mining Ch1
38 pages
Kinds of Data
No ratings yet
Kinds of Data
8 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Advanced Database
No ratings yet
Advanced Database
22 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
No ratings yet
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
54 pages
Data Mining Warehousing - Data Mining - Notes
No ratings yet
Data Mining Warehousing - Data Mining - Notes
56 pages
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
No ratings yet
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
4 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
14 pages
Unit 3 DBMS
No ratings yet
Unit 3 DBMS
114 pages
Data Modeling: Agnivesh Kumar
100% (1)
Data Modeling: Agnivesh Kumar
21 pages
Data Modeling Principles
100% (1)
Data Modeling Principles
21 pages
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
No ratings yet
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
11 pages
Unit I Introduction 1.1 What Motivated Data Mining? Why Is It Important?
No ratings yet
Unit I Introduction 1.1 What Motivated Data Mining? Why Is It Important?
18 pages
Ism Second Module
No ratings yet
Ism Second Module
73 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Nekobin
No ratings yet
Nekobin
2 pages
Unit-5 Bi
No ratings yet
Unit-5 Bi
47 pages
ITherm24 Intel Server Level Impacts On CPU Cooling Capability in Single-Phase Immersion
No ratings yet
ITherm24 Intel Server Level Impacts On CPU Cooling Capability in Single-Phase Immersion
7 pages
1VP Set-Up EN XX
No ratings yet
1VP Set-Up EN XX
84 pages
Literature Review: Modern Public Library
100% (3)
Literature Review: Modern Public Library
8 pages
Corrugated Samadhan ERP Built On Microsoft Dynamics 365 Business Central
No ratings yet
Corrugated Samadhan ERP Built On Microsoft Dynamics 365 Business Central
12 pages
One Minute Academy Student Handbook (English)
No ratings yet
One Minute Academy Student Handbook (English)
29 pages
Time Table CS Department Spring-2025
No ratings yet
Time Table CS Department Spring-2025
1 page
Method Statement - BMS Testing Rev 0
No ratings yet
Method Statement - BMS Testing Rev 0
16 pages
R Unit 2 Notes
No ratings yet
R Unit 2 Notes
14 pages
Girl That Will Send Free Nudes - Google Search
No ratings yet
Girl That Will Send Free Nudes - Google Search
1 page
Friends Forever
No ratings yet
Friends Forever
2 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
6 pages
Exploded View and Parts List
No ratings yet
Exploded View and Parts List
2 pages
Business Operations Engineer
No ratings yet
Business Operations Engineer
3 pages
ST Secure Solutions Authentication and Iot
No ratings yet
ST Secure Solutions Authentication and Iot
14 pages
Ict - chs9 Lesson 1 - Basic Computer Configuration Setup
No ratings yet
Ict - chs9 Lesson 1 - Basic Computer Configuration Setup
29 pages
Openview Operations Error Messages
No ratings yet
Openview Operations Error Messages
267 pages
Final Year Project Review
No ratings yet
Final Year Project Review
25 pages
HELE 5 Lesson 4 - The Search Engine - Websites and Bookmarks
No ratings yet
HELE 5 Lesson 4 - The Search Engine - Websites and Bookmarks
32 pages
Week 2 Assignment OOP C++
100% (1)
Week 2 Assignment OOP C++
7 pages
Universal Remote Instruction Manual
No ratings yet
Universal Remote Instruction Manual
16 pages
Basic Computer System Hardware: Computer Basics-Rockaway Township Public Library Class
No ratings yet
Basic Computer System Hardware: Computer Basics-Rockaway Township Public Library Class
6 pages
Security System PDF
No ratings yet
Security System PDF
4 pages
Sia by Khadeeja
No ratings yet
Sia by Khadeeja
5 pages
DBMS Assignment 2
No ratings yet
DBMS Assignment 2
9 pages
How To Install Openshift On A Laptop or Desktop
100% (1)
How To Install Openshift On A Laptop or Desktop
7 pages
Dsap Lab Report 077bei045
No ratings yet
Dsap Lab Report 077bei045
27 pages
Experiment-1: AIM: To Study Installation of Oracle9i
No ratings yet
Experiment-1: AIM: To Study Installation of Oracle9i
13 pages

Data Mining L-3,4

Uploaded by

Data Mining L-3,4

Uploaded by

Rishi Sharma

❖ The new database applications include handling:

❖ A temporal database stores relational data that include time-related attributes.

DM system works in an environment that requires it to communicate with other

❖ DM system will use some facilities of a DB or DW system, fetching data from

❖ Mining different kinds of knowledge in databases

You might also like