Major Components of Data Mining System
Major Components of Data Mining System
Kinds of data:
Data mining should be applicable to any kind of data repository such as data
streams. It includes the following data repositories.
Relational databases
Data warehouses
Transactional databases
Advanced database systems
Relational databases:
A database system, also called as database management system(DBMS),
consists of a collection of interrelated data, known as database, and a set of
software programs to manage and access the data. The software programs
involve mechanisms for the definition of database structures, data storage,
concurrent, and shared data access for ensuring the consistency and security.
A relational database is a collection of tables, each is assigned a unique name.
each table consists of a set of columns and rows. Each row represents an unique
key and described by a set of attribute values.
Entity relationship (ER), a semantic data model represents the database as a set
of entities and their relationships.
Example:
All electronics company contain the following tables: customer, item, employee,
and branch.
The relation table customer consists of a set of attributes, including a unique
customer identity number(cust_id), customer name, address, age, occupation,
annual income, credit information, category and so on. Other tables are
describing their properties with a set of attributes.
Figure 1.6 fragments of relations from a relational database for Allelectronics
Transactional databases:
Transactional database consists of a file where each record represents a
transaction. A transaction typically includes a unique transaction identity number
and a list of items making up the transaction.
Figure 1.9 fragment of a transactional database for sales at All electronics
The transactional database may have additional tables associated with it,
regarding the sale, such as the date of the transaction, the customer ID number,
the ID number of the salesperson and the branch.
Example 1.3 A transactional database for Allelectronics. Transactions can be
stored in a table, with one record per transaction. Transactional database is
stored in a flat file or unfolded into a standard relation. Market basket data
analysis enables you to bundle groups of items together as a strategy for
maximizing sales. Data mining systems for transactional data can identify
frequent itemsets that are sold together.
Advanced data and information systems and advanced applications:
The new database applications include handling spatial data such as maps,
engineering design data such as integrated circuits and system components,
hypertext and multimedia data, time-related data, stream data, and the world
wide web. These applications require efficient data structures and scalable
methods for handling complex object structures; variable-length records;
semistructured or unstructured data; text, spatiotemporal, multimedia data,
database schemas and dynamic changes.
Advance database systems and specific application-oriented database systems
include object-relational database systems, temporal and time-series database
systems, spatial and spatiotemporal database systems, text and multimedia
database systems and web-based global information systems.
These databases require sophisticated facilities to store, retrieve, and update
large amounts of complex data. They provide fertile grounds, raise many
challenging research and implementation issues for data mining.
Object relational databases
These are constructed based on an object-relational data model. This model
extends the relational model by providing a rich data type for handling complex
objects and object orientation.
The object-relational data model inherits the essential concepts of object-
oriented databases, where each entity is considered as an object. Data and code
relating to an object are encapsulated into a single unit. Each object has
associated with the following:
A set of variables that describe the objects. These correspond to attributes in the
entity-relationship and relational models.
A set of messages that the object can use to communicate with other objects, or
with the rest of the database system.
A set of methods, where each method holds the code to implement a message.
Upon receiving a message, the method returns a value in response.
Objects that share a common set of properties can be grouped into an object
class. Each object is an instance of its class. Object classes can be organized into
class or subclass hierarchies so that each class represents properties that are
common to objects in that class. For example sales person is a subclass of the
class, employee. Sales person object would inherit all of the variables pertaining
to its super class of employee. Such a class inheritance feature benefits
information sharing.
Temporal databases, sequence databases, and time-series databases:
A temporal database stores relational data that include time-related attributes.
These attributes may involve several timestamps, each having different
semantics.
A sequence database stores sequences of ordered events, with or without a
concrete a notion of time.
A time-series database stores sequences of values or events obtained over
repeated measurements of time.
Result Analysis
Fig. 2.1 Steps in Knowledge