Data Mining L-3,4
Data Mining L-3,4
IIIT Surat
Data for Data Mining
❖ Data mining should be applicable to any kind of data repository, as well as to
transient data, such as data streams.
➢ Relational databases,
➢ data warehouses,
➢ transactional databases,
➢ advanced database systems,
➢ flat files,
➢ data streams, and
➢ World Wide Web.
❖ Advanced database systems include object-relational databases and
specific application-oriented databases: spatial databases, time-series
databases, text databases, and multimedia databases.
Relational Databases
❖ A relational database is a collection of tables, each of which is assigned a
unique name.
❖ Each table consists of a set of attributes and usually stores a large set of
tuples .
❖ Each tuple in a relational table represents an object identified by a unique key
and described by a set of attribute values.
❖ A semantic data model, such as an entity-relationship (ER) data model, is
often constructed for relational databases.
Data Warehouses
❖ A data warehouse is a repository of information collected from multiple sources,
stored under a unified schema, and that usually resides at a single site.
❖ Data warehouses are constructed via a process of data cleaning, data
integration, data transformation, data loading, and periodic data refreshing.
❖ A data warehouse is usually modeled by a multidimensional database structure,
where each dimension corresponds to an attribute or a set of attributes in the
schema, and each cell stores the value of some aggregate measure.
❖ The actual physical structure of a data warehouse may be a relational data
store or a multidimensional data cube.
Fig: Framework of a data warehouse
Transactional Databases
❖ Transactional database consists of a file where each record represents a
transaction. A transaction typically includes a unique transaction identity
number (trans ID) and a list of the items making up the transaction.
❖ The transactional database may have additional tables date of the
transaction, the customer ID number, the ID number of the salesperson and
of the branch.
Advanced Data and Information Systems and Advanced Applications
❖ Data mining system will not use any function, i.e. no communication with database.
It communicate with other storage methods/file system.
❖ Drawback:
➢ DB system provides a great deal of flexibility and efficiency at storing, organizing, accessing, and
processing data. Without using a DB/DW system, a DM system may spend a substantial amount of
time finding, collecting, cleaning, and transforming data.
➢ DM system will need to use other tools to extract data, making it difficult to integrate such a system
into an information processing environment. Thus, no coupling represents a poor design
Loose Cupling