Kinds of Data

The document discusses various types of data repositories applicable for data mining, including relational databases, data warehouses, transactional databases, and advanced data systems. It explains the structure and functionalities of these databases, such as data organization, querying, and the importance of data mining in extracting valuable insights. Additionally, it highlights the evolution of data technology and the significance of data mining in transforming large volumes of data into actionable knowledge.

Uploaded by

13it11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

Kinds of Data

Uploaded by

13it11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Kinds of data:

Data mining should be applicable to any kind of data repository such as data streams.
It includes the following data repositories.
 Relational databases
 Data warehouses
 Transactional databases
 Advanced database systems
Relational databases:
A database system, also called as database management system(DBMS), consists of a
collection of interrelated data, known as database, and a set of software programs to manage
and access the data. The software programs involve mechanisms for the definition of
database structures, data storage, concurrent, and shared data access for ensuring the
consistency and security.
A relational database is a collection of tables, each is assigned a unique name. each
table consists of a set of columns and rows. Each row represents an unique key and described
by a set of attribute values.
Entity relationship (ER), a semantic data model represents the database as a set of
entities and their relationships.
Example:
All electronics company contain the following tables: customer, item, employee, and branch.
The relation table customer consists of a set of attributes, including a unique customer
identity number(cust_id), customer name, address, age, occupation, annual income, credit
information, category and so on. Other tables are describing their properties with a set of
attributes.
Figure 1.6 fragments of relations from a relational database for Allelectronics

Relational data can be accessed by database queries written in a relational query

language such as sql.
A query contains a set of relational operations such as join, selection, and projection
and optimized for efficient processing.
A query allows retrieval of specified subsets of the data. Relational languages also
include aggregate functions such as sum, avg, count, max(maximum) and min(minimum).
Data mining in relational databases is applied in searching for trends or data patterns.
It also detect deviations and can be further investigated.
Data warehouses:
A data warehouse is a repository of information collected from multiple sources,
stored under a unified schema and resides at a single site. It is constructed via a process of
data cleaning, data integration, data transformation, data loading and data refreshing.
Figure 1.7 Typical framework of a data warehouse for Allelectronics

Data warehouses are organized as major subjects such as customer, item, supplier and
activity to faciliatate decision making. The data are stored to provide information from a
historical perspective and summarized.
A data warehouse is modelled by a multidimensional database structure, where each
dimension refers to an attribute or a set of attributes in the schema. Each cell stores the value
of some aggregate measure.
The actual physical structure of data warehouse is a multidimensional data cube. It
allows the precomputation and fast accessing of summarized data.
Example: A data cube for Allelectronics. It has three dimensions: address(city values
Chicago, new York, Toronto, Vancouver), time (q1, q2,q3,q4) and item( home entertainment,
computer, phone, security). The aggregate value stored in each cell is sales_amount.
Difference between data warehouse and data mart: A data warehouse collects
information about subjects that span an entire organization, and its scope is enterprise-wide. A
data mart is a department subset of a data warehouse. It focuses on selected subjects, scope is
department-wide.
Data warehouse is well suited for on-line analytical processing by providing
multidimensional data views. OLAP operations use background knowledge to allow the
presentation of data at different levels of abstraction. OLAP operations include drill-down
and roll-up which allows the user to view the data at differing degrees of summarization.
Figure 1.8 Multidimensional data cube commonly used for data warehousing

Transactional databases:
Transactional database consists of a file where each record represents a transaction. A
transaction typically includes a unique transaction identity number and a list of items making
up the transaction.
Figure 1.9 fragment of a transactional database for sales at All electronics

The transactional database may have additional tables associated with it, regarding
the sale, such as the date of the transaction, the customer ID number, the ID number of the
salesperson and the branch.
Example 1.3 A transactional database for Allelectronics. Transactions can be stored in
a table, with one record per transaction. Transactional database is stored in a flat file or
unfolded into a standard relation. Market basket data analysis enables you to bundle groups
of items together as a strategy for maximizing sales. Data mining systems for transactional
data can identify frequent item sets that are sold together.
Advanced data and information systems and advanced applications:
The new database applications include handling spatial data such as maps,
engineering design data such as integrated circuits and system components, hypertext and
multimedia data, time-related data, stream data, and the world wide web. These applications
require efficient data structures and scalable methods for handling complex object structures;
variable-length records; semistructured or unstructured data; text, spatiotemporal, multimedia
data, database schemas and dynamic changes.
Advance database systems and specific application-oriented database systems include
object-relational database systems, temporal and time-series database systems, spatial and
spatiotemporal database systems, text and multimedia database systems and web-based
global information systems.
These databases require sophisticated facilities to store, retrieve, and update large
amounts of complex data. They provide fertile grounds, raise many challenging research and
implementation issues for data mining.
Object relational databases
These are constructed based on an object-relational data model. This model extends
the relational model by providing a rich data type for handling complex objects and object
orientation.
The object-relational data model inherits the essential concepts of object-oriented
databases, where each entity is considered as an object. Data and code relating to an object
are encapsulated into a single unit. Each object has associated with the following:
A set of variables that describe the objects. These correspond to attributes in the
entity-relationship and relational models.
A set of messages that the object can use to communicate with other objects, or with
the rest of the database system.
A set of methods, where each method holds the code to implement a message. Upon
receiving a message, the method returns a value in response.
Objects that share a common set of properties can be grouped into an object class.
Each object is an instance of its class. Object classes can be organized into class or subclass
hierarchies so that each class represents properties that are common to objects in that class.
For example sales person is a subclass of the class, employee. Sales person object would
inherit all of the variables pertaining to its super class of employee. Such a class inheritance
feature benefits information sharing.
Temporal databases, sequence databases, and time-series databases:
 A temporal database stores relational data that include time-related attributes.
These attributes may involve several timestamps, each having different
semantics.
 A sequence database stores sequences of ordered events, with or without a
concrete a notion of time.
 A time-series database stores sequences of values or events obtained over
repeated measurements of time.

Spatial databases and spatiotemporal databases:

Spatial databases contain spatial-related information. Example include geographic
databases, very large-scale integration (VLSI) or computed-aided design databases, and
medical and satellite image databases.
Spatial data may be represented in raster format, consisting of n-dimensional bit maps
or pixel maps. Examples are 2-D satellite image may be represented as raster data.
Maps can be represented in vector format, where roads, bridges, buildings and lakes
are represented as unions. Basic geometric constructs such as points, lines, polygons and
networks formed by these components.
Geographic databases have numerous applications, ranging from forestry and ecology
planning to provide public service information regarding the location of telephone and
electric cables, pipes and sewage systems. Spatial databases may cover specified kind of
location and climate of mountain areas located at various altitudes.
The relationships among a set of spatial objects can be examined in order to discover
which subsets of objects are spatially auto-correlated. A spatial database that stores spatial
objects that change with time is called a spatiotemporal database. For example identifying
the trends of moving objects and identify strangely moving vehicles.
Text databases and multimedia databases:
Text databases are databases that contain word descriptions for objects. These word
descriptions are long distances, paragraphs such as product specifications, error or bug
reports, warning messages, and summary reports.
Text databases may be hightly unstructured such as some web pages or structured or
semistructured such as email messages and many HTML/XML web pages or relatively well
structured such as library catalogue databases.
By mining text data, one may uncover general and concise descriptions of the text
documents, keyword or content associations. To do this, standard data mining methods need
to be integrated with information retrieval techniques and the construction for text data.
Multimedia databases store image, audio, and video data. They are used in
applications such as picture content-based retrieval, voice-mail systems, video on demand
systems, the world wide web, and speech-based user interfaces.
Multimedia databases must support large objects, because data objects such as video
can require gigabytes of storage. Video and audio data require real-time retrieval at a steady
and predermined rate to avoid picture or sound gaps. Such data are referred to as continuous-
media data.
Heterogeneous databases and legacy databases
A heterogeneous database consists of a set of interconnected, autonomous component
databases. The components communicate in order of exchange information and answer
queries.
A legacy database is a group of heterogeneous databases that combines different kinds
of data systems, such as relational or object – oriented databases, hierarchical databases,
network databases, spreadsheets, or multimedia databases. These databases may be connected
by intra or inter- computer networks.
Information exchange across such databases is difficult as it requires precise
transformation rules from one representation to another, considering semantics.
Data mining techniques may provide an interesting solution to the information
exchange problem by performing statistical data distribution and correlation analysis. It
transforms the given data into higher, and conceptual levels.
Data streams:
The generation and analysis of a new kind of data, called stream data, where data flow
in and out of an observation platform dynamically. It has unique features such as huge or
possibly infinite volume, dynamically changing, flowing in and out in a fixed order, allowing
only one or a small number of scans, and demanding fast response time.
Effective and efficient management and analysis stream of data poses great challenges
to researchers. A typical query model in such a system is the continuous query model, where
predefined queries constantly evaluate incoming streams, collect aggregate data, report the
current status of data streams, and respond to their changes.
Mining data streams involves the efficient discovery of general patterns and dynamic
changes with stream data. Most stream data reside at a low level of abstraction, and analysts
are often interested in higher and multiple levels of abstraction.
The World wide web
The world wide web provides associated distributed information services such as
yahoo, google, America online, and altavista. Data objects are linked together to facilitate
interactive access. Users search information traverse from one object via links to another. It
helps improve system design and also leads to better marketing decisions.
Capturing user access patterns in such distributed information environments is called
web usage mining. Web pages can be highly unstructured and lack a predefined schema, type
or pattern. Thus it is difficult for computers to understand the semantic meaning for
systematic information and data mining. Keyword based searches offer only limited help for
users.
Data mining provide additional help by authoritative web page analysis based on
linkages among webpages can help rank web pages. Automated web page clustering and
classification help group and arrange web pages in a multidimensional manner based on their
contents.
Web community analysis helps identify hidden web social network and communities
and observe their evolution. Web mining is the development of scalable and effective web
data analysis and mining methods. It helps to learn, characterize, classify web pages, and
uncover web dynamics and the association among different web pages, users, communities
and web based activities.

Importance of Data Mining:

The information and knowledge gained by data mining can be used for applications
ranging from market analysis, fraud detection, and customer retention. It can be viewed as a
result of the natural evolution of information technology. It contains the following
functionalities.
 Data collection
 Database creation
 Data management
 Advanced data analysis
Since 1960, database and information technology has been evolving systematically
from primitive file processing systems to powerful database systems. Efficient methods for
online transaction processing(OLAP), where a query is viewed as a read-only transaction as a
major tool for efficient storage, retrieval, and management of large amounts of data.
It promotes the development of advanced data models such as extended-relational,
object-oriented, object-relational and deductive models. Spatial, temporal, multimedia, active,
stream, sensor and scientific and engineering databases, and knowledge bases.
One data repository architecture that has emerged is the data warehouse, a repository
of multiple heterogeneous data sources organized under a unified schema at a single unit for
facilitate management decision making. It includes data cleaning, data integration, and online
analytical processing(OLAP) with summarization, consolidation and aggregation
functionalities.
In addition, huge volumes of data can be accumulated beyond databases and data
warehouses. Examples are world wide web and data streams, where data flow in and out like
streams, as in application like video surveillance, telecommunication and sensor networks.
The abundance of data, coupled with the need for powerful data analysis tools, described as a
data rich but information poor situation. The widening gap between data and information
calls for a systematic development of data mining tools that will turn data tombs into “golden
nuggets” of knowledge.

Q-Ans All Competitive Exam Guide Ebook by Education For Assam (BIJAY KOCH)
No ratings yet
Q-Ans All Competitive Exam Guide Ebook by Education For Assam (BIJAY KOCH)
49 pages
Data Mining MCA 3 Sem
No ratings yet
Data Mining MCA 3 Sem
51 pages
Data Description For Data Mining
No ratings yet
Data Description For Data Mining
7 pages
Welding Machine Specifications PDF
0% (1)
Welding Machine Specifications PDF
4 pages
Headspace 2nd Year
100% (1)
Headspace 2nd Year
401 pages
Berryman
No ratings yet
Berryman
24 pages
Object Oriented DBMS-1
No ratings yet
Object Oriented DBMS-1
23 pages
DWDM Module II
No ratings yet
DWDM Module II
103 pages
1904504-GIS - Notes Unit 3
No ratings yet
1904504-GIS - Notes Unit 3
45 pages
DM - UNIT I
No ratings yet
DM - UNIT I
58 pages
SM PDF
No ratings yet
SM PDF
417 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages
Topic 1 Moodle
No ratings yet
Topic 1 Moodle
55 pages
Vision
No ratings yet
Vision
39 pages
MIS IRM Ch03.tgh
No ratings yet
MIS IRM Ch03.tgh
22 pages
Unit 1 DM
No ratings yet
Unit 1 DM
37 pages
DW Presentation Logic
No ratings yet
DW Presentation Logic
94 pages
Vinoya, Maria Karla N. - PhD-ELS - Qualitative Research Proposal
No ratings yet
Vinoya, Maria Karla N. - PhD-ELS - Qualitative Research Proposal
33 pages
Fundamental Data Base Haile
No ratings yet
Fundamental Data Base Haile
23 pages
???? ?????????
No ratings yet
???? ?????????
22 pages
MIS Mod2
No ratings yet
MIS Mod2
36 pages
Unified Modeling Language (Uml) : Assignment
No ratings yet
Unified Modeling Language (Uml) : Assignment
32 pages
Mining Kind of Data
No ratings yet
Mining Kind of Data
24 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
1.4 Process Models
No ratings yet
1.4 Process Models
40 pages
Business Result Pre-Int. Wordlist English-French
No ratings yet
Business Result Pre-Int. Wordlist English-French
18 pages
API ISCAN-LITE Scanner
No ratings yet
API ISCAN-LITE Scanner
4 pages
Chronicles of Counterfeit by Lubogo Jireh, Lubogo Israel, Lubogo Zion and Lubogo Isaac
No ratings yet
Chronicles of Counterfeit by Lubogo Jireh, Lubogo Israel, Lubogo Zion and Lubogo Isaac
214 pages
Mapping The Enterprise: Modeling The Enterprise As Services With Enterprise Canvas 1 / Converted Edition Tom Graves
No ratings yet
Mapping The Enterprise: Modeling The Enterprise As Services With Enterprise Canvas 1 / Converted Edition Tom Graves
54 pages
Datamining 1
No ratings yet
Datamining 1
21 pages
Outline of Databases - Wikipedia
No ratings yet
Outline of Databases - Wikipedia
21 pages
PyTorch Geometric Temporal Spatiotemporal Signal Processing
No ratings yet
PyTorch Geometric Temporal Spatiotemporal Signal Processing
10 pages
Land Earth Station (LES) Configuration of Sat-C Terminals
No ratings yet
Land Earth Station (LES) Configuration of Sat-C Terminals
9 pages
Major Components of Data Mining System
No ratings yet
Major Components of Data Mining System
9 pages
Data Warehouse Modeling
No ratings yet
Data Warehouse Modeling
17 pages
Basis Midterm Database
No ratings yet
Basis Midterm Database
19 pages
INFOMAN Prelim Notes
No ratings yet
INFOMAN Prelim Notes
9 pages
Data Mining L-3,4
No ratings yet
Data Mining L-3,4
25 pages
Gis QB Ans
No ratings yet
Gis QB Ans
9 pages
Unit V 1
No ratings yet
Unit V 1
23 pages
Data Modeling Principles
100% (1)
Data Modeling Principles
21 pages
Unit No.6 Object Oriented Databases & Applications
No ratings yet
Unit No.6 Object Oriented Databases & Applications
20 pages
DWDM - (UNIT-1) : SVIT College of Engineering, ATP
No ratings yet
DWDM - (UNIT-1) : SVIT College of Engineering, ATP
40 pages
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
No ratings yet
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
12 pages
Painting Crew Supervisor Interview Question
No ratings yet
Painting Crew Supervisor Interview Question
6 pages
GIS Database Creation and Design
100% (2)
GIS Database Creation and Design
24 pages
Single Sideband Modulation
No ratings yet
Single Sideband Modulation
25 pages
(HK241) Convolution Operation
No ratings yet
(HK241) Convolution Operation
6 pages
Some Term Relate To SQL
No ratings yet
Some Term Relate To SQL
9 pages
Data Base Management Sysytem
No ratings yet
Data Base Management Sysytem
26 pages
GISA Data Storage and Management Lecture5 Edited13102023
No ratings yet
GISA Data Storage and Management Lecture5 Edited13102023
44 pages
What Motivated Data Mining? Why Is It Important?: The Evolution of Database Technology
100% (1)
What Motivated Data Mining? Why Is It Important?: The Evolution of Database Technology
18 pages
DP900 Chapter1 Notes
No ratings yet
DP900 Chapter1 Notes
10 pages
CCNA 1 v7.0 Modules 16 - 17: Building and Securing A Small Network Exam Answers 2020
No ratings yet
CCNA 1 v7.0 Modules 16 - 17: Building and Securing A Small Network Exam Answers 2020
25 pages
Database Management System (DBMS)
No ratings yet
Database Management System (DBMS)
45 pages
Database: Spatial Data GIS Database
No ratings yet
Database: Spatial Data GIS Database
31 pages
Evolution of Database
No ratings yet
Evolution of Database
15 pages
HW2 111306048
No ratings yet
HW2 111306048
4 pages
Data Modeling: Agnivesh Kumar
100% (1)
Data Modeling: Agnivesh Kumar
21 pages
Unit 1 (DMW)
No ratings yet
Unit 1 (DMW)
53 pages
Introduction To New Steel Bridge Design Method Using Sandwich Slab Technology - The Institution of Structural Engineers
No ratings yet
Introduction To New Steel Bridge Design Method Using Sandwich Slab Technology - The Institution of Structural Engineers
4 pages
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
No ratings yet
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
84 pages
Relational Databases and Beyond
No ratings yet
Relational Databases and Beyond
12 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
12 pages
Chapter 7-1
No ratings yet
Chapter 7-1
4 pages
EE102 Lab 4
No ratings yet
EE102 Lab 4
10 pages
Chapter 5 Database
No ratings yet
Chapter 5 Database
8 pages
Touch Screen Technology: Let'S Touch The Future
No ratings yet
Touch Screen Technology: Let'S Touch The Future
45 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
PNZ Series
No ratings yet
PNZ Series
2 pages
Database System Concepts and Architecture
No ratings yet
Database System Concepts and Architecture
24 pages
Star Schema
No ratings yet
Star Schema
44 pages
DWH Concepts
No ratings yet
DWH Concepts
18 pages
API - Pipeline Fact Sheet - RV8
No ratings yet
API - Pipeline Fact Sheet - RV8
1 page
Unit 1: Introduction To Big Data: Types of Data and Their Characteristics
No ratings yet
Unit 1: Introduction To Big Data: Types of Data and Their Characteristics
7 pages
Database Notes
No ratings yet
Database Notes
40 pages
Data Structures and DBMS For CAD Systems - A Review
No ratings yet
Data Structures and DBMS For CAD Systems - A Review
9 pages
Development of Smart Multi-Level Inverter With Remote Monitoring System
No ratings yet
Development of Smart Multi-Level Inverter With Remote Monitoring System
5 pages
Double Skin Façade and Potential Integration With Other Building Environmental Technologies and Materials
No ratings yet
Double Skin Façade and Potential Integration With Other Building Environmental Technologies and Materials
8 pages
Minutes of Meeting Held Between M/S Ultra Tech Sewagram Cements LTD and M/S S.N Enviro Solutions PVT LTD
No ratings yet
Minutes of Meeting Held Between M/S Ultra Tech Sewagram Cements LTD and M/S S.N Enviro Solutions PVT LTD
1 page
On What Kind of Data Mining Task Can Be Performed? or Explain Different Data Repository On Which Data Mining Task Can Be Performed
No ratings yet
On What Kind of Data Mining Task Can Be Performed? or Explain Different Data Repository On Which Data Mining Task Can Be Performed
5 pages
Design of A Latent Heat Storage System For The Replacement of Cooling Tower For DG Set
No ratings yet
Design of A Latent Heat Storage System For The Replacement of Cooling Tower For DG Set
6 pages
Database Management Systems
No ratings yet
Database Management Systems
5 pages
Data Model Tree: Relational Model Organizes Data Into One or More
No ratings yet
Data Model Tree: Relational Model Organizes Data Into One or More
1 page
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
Hyperion Essbase
No ratings yet
Hyperion Essbase
16 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Kinds of Data

Uploaded by

Kinds of Data

Uploaded by

Kinds of data:

Relational data can be accessed by database queries written in a relational query

Spatial databases and spatiotemporal databases:

Importance of Data Mining:

You might also like