0% found this document useful (0 votes)
26 views7 pages

IFT 703 Assignment by MSC-IFT-22-0348

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

IFT 703 Assignment by MSC-IFT-22-0348

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

REG No: MSC/IFT/22/0348

Course Code: IFT 703


Course Title: Advanced Database Management System

Research Topic:
Integration of Data Mining and Object-Relational Database
Systems
Submitted to:
DR. ILIYASU ADAMU

Abstract:
Data mining techniques, based on statistics and machine learning can significantly boost the ability to
analyze large amounts of data. Despite its potential, this technology is destined be a niche technology unless
an effort is made to integrate it with the new evolving Object-Relational Database Systems. The traditional
database systems are not well suited to meet the challenges of the future. Relational models lack support for
the complex data needed by today’s enterprises whereas the object models suffer from scalability problems.
Object-Relational Model combines the advantages of the traditional models while overcoming their
deficiencies. This technical research paper explores the key issues, challenges and methods to enable the
seamless integration of data mining technology within the framework of the Object-Relational Database
Systems. This integration is the key to making it convenient to use, easy to deploy in real applications, and
to growing its user base
Key words: Data Mining, Object-Relational Database Systems, Object-Oriented Integration Framework
for Data Mining

1. Introduction
1.1. Overview of Data Mining
Mining has always been associated with dark, bottomless pits and workers who didn't see the light of day
for hours at a time. Data mining derives its name from the similarities between searching for valuable
business information in a large database and mining a mountain for a vein of valuable ore. Both processes
require either sifting through an immense amount of material, or intelligently probing it to find exactly
where the value resides.
Data mining as defined by Kamber, (2000), is the extraction of hidden predictive information from large
databases. It is a powerful new technology with great potential to help companies focus on the most
important information in their data warehouses. Data mining tools predict future trends and behaviors,
allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses

Page 1 of 7
offered by data mining move beyond the analyses of past events provided by retrospective tools. Data
mining tools can answer business questions that traditionally were too time-consuming to resolve. They
scour databases for hidden patterns, finding predictive information that experts may miss because it lies
outside their expectations.
Most major organizations have data warehouses containing information about their clients, competitors and
products. These huge data warehouses contain gigabytes with "hidden" information that can't be easily
found using typical database
queries, giving rise to the myth that the more data you have, the less you know. Data mining algorithms
change all that by finding interesting patterns that an enterprise didn’t even know were there.
1.2. Significance of Data Mining
A recent Gartner Group Advanced Technology Research Note listed data mining and artificial intelligence
at the top of the five key technology areas that “will clearly have a major impact across a wide range of
industries within the next 3 to 5 years”.
It also observed that, “With the rapid advance in data capture, transmission and storage, large-systems users
will increasingly need to implement new and innovative ways to mine the after-market value of their vast
stores of detail data, employing MPP [massively parallel processing] systems to create new sources of
business advantage. Within the next 2- 3 years, at least half of the Fortune 1000 companies worldwide will
be using data mining technology.”
The way in which companies interact with their customers has changed dramatically over the past few
years. A customer’s continuing business is no longer guaranteed. As a result companies have found that they
need to understand their customers better, and to quickly respond to their wants and needs. In addition, time
frame in which these responses need to be made has been shrinking. It is no longer possible to wait until the
signs of customer dissatisfaction are obvious before action must be taken. To succeed companies must be
proactive and anticipate what a customer desires. It is here that data mining enters the picture and provides
this ability to organizations.

1.3. The Object-Relational Perspective


The complexity and richness of data to be handled by business applications is constantly increasing. The
explosion of the World Wide Web has made it possible to publish content that involves text, image, audio
and video data. Intranets and extranets help drive the data workflow both within a company and externally
with its partners, suppliers and customers to support its business processes.
We are in a period of intensive change and innovation regarding database technology and related products.
The pressures of a competitive marketplace are driving corporations to build and evolve their applications in
a timely and cost-effective manner. Increasingly, companies need to build applications that closely match

Page 2 of 7
their business models and processes. Modern database applications need to store and manipulate objects that
are neither small nor simple, and to perform operati
operations
ons on these objects that are not predefined. The
database technology is entering a new period with the emphasis quickly shifting from traditional relational
database systems towards Object--Relational
Relational Database Systems. The reason for the growing popularity is
their inherent support of object-oriented
oriented technologies.
According to Ramakanth 2011, Object
Object-Relational Database Systems is defined as “A system that includes
both object infrastructure and a set of relational extenders that exploit it is called an Object
Object-Relational
Database System.” Object Relational Databas
Databasee Systems (ORDBMS) combine the advantages of modern
object-oriented
oriented programming languages with relational database features such as multiple views of data and
a high-level,
level, nonprocedural query language. Object
Object-Relational
Relational systems extenders provide the capabilities
capab
needed to manage today's specialized objects, and its object infrastructure gives the ability to define new
types, functions, and rules to deal with the evolving needs of businesses. ORDBMS have the scalability and
robustness of current relationall database systems. In addition, the ORDBMS are able to easily store
complex, structured data and large, unstructured, domain
domain-specific
specific data, such as text, image, audio and video.
ORDBMS allows the users to define their own types, thus extending the type se
set of the database. Complex
data like graphics, images, videos and songs can be stored in the database directly. These can also be
queried on. Furthermore they also provide features such as encapsulation of data, inheritance between types
and polymorphism (Chamberlin,, 2010
2010). These features are shown in Figure 1.

Figure 1. Features of an ORDBMS

1.4. Database Integration


Companies spend millions of dollars to build data warehouses to hold their data and data mining techniques
must take advantage of this. Besides saving significant manual effort and storage space, this integration
allows data mining applications to access the most up
up-to-date
date information available. Many leading vendors
like IBM, Oracle etc. have taken positive steps in this regard, but there is still room for improvement.

Page 3 of 7
Success of data mining as an enterprise technology crucially depends on seamless integration of this
technology with enterprise databases, and more specifically the newly emerging Object-Relational Database
Systems. Data mining applications must be smoothly integrated within the Object Relational Database
Systems in order to get the maximum benefit out its inherent object-orientation.
The present era is seeing sweeping changes at an unprecedented rate. In such a fast paced world,
technologies need to keep up to date with developments in related fields or they would be rendered obsolete.
This paper visualizes that in the course of time much of what is called data mining will end up as standard
tools built into Object-Relational databases themselves.
2. Discussion
There are two ways in which the integration between Data Mining and Object-Relational Database Systems
can be achieved.
· Amalgamation of Data Mining in the Database Schema itself
· Application of Data Mining process using a host language on the data contained in an Object-Relational
Database System.

2.1. Amalgamation of Data Mining in the Database Schema.


Although a mining model may be derived using a SQL application implementing a training algorithm on an
Object Relational Database, the database management system is completely unaware of the semantics of
mining models. The reason is that in such a case the mining models are not explicitly represented in the
database. Unless such explicit representation is enabled, the database management system capabilities
cannot be leveraged for sharing, reusing and managing mining models in a rich way. In particular, even if
several mining models have been created, there is no way for a user or an application to search the set of
available models based on its properties. Similarly there is no suitable mechanism to indicate that a certain
model should be applied to predict a column of an unknown data set and then to query the result of the
prediction, e.g. to compare the results of predictions from two models.
Before we discuss how one can view mining models in ORDBMS, we need to recognize the data
representation needs related to mining. Traditional statistical learning algorithms prefer to view a data set to
consist of (attribute, value) pairs representing “cases” (observations) on a certain entity, e.g., customer.
Effective data mining often requires that these cases in the training consolidate all the information relevant
to the entity. Since in relational databases data is often scattered over multiple normalized tables, this creates
a conceptual mismatch. In fact, these cases could be far better represented as objects than mere flat records.
Traditional SQL representation also falls short in capturing metadata on columns. To effectively derive and
use a mining model, we must be able to identify properties of an attribute (e.g., discrete vs. continuous) and

Page 4 of 7
relationships among attributes. The Object-Relational Model is perfect for such a scheme due to its inherent
support for objects.
Amalgamation of Data Mining in the Object-Relational database schema would allow each case being used
for data mining to be represented as an object. The attribute of the case can directly be represented as the
attributes of the object and the relationship between these attributes can also be implemented using object
references. Traditionally the concept of a case bears resemblance to a set of related records. Representing it
as an object would allow us to make full use of object oriented features. Objects combine data and behavior
into a single entity and hence would provide a better conceptual understanding regarding the data mining
models developed.
The current data mining technology does not go beyond traditional relational data. It is not able to handle
complex data types such as images, geographical maps, movie clips, astronomical reading, recorded sounds
etc. Bringing data
mining within object-relational database systems would easily solve this problem. All of these data types
can be easily represented as objects and in deed many systems do treat them as objects. Object
representation would allow data mining algorithms to be implemented as member functions of the object
and thus would be able to preserve the semantics of the case.
2.2. Application of Data Mining process using a host language on the data contained in an Object-
Relational Database System
Although the above integration framework is desirable, it would require changes to be made to the
underlying database system and then require the supervision of a database administrator (DBA) to make the
necessary modifications in the schema. A more flexible approach is to consider data mining in existing
ORDBMS using the host language. The proposed framework to accomplish this would involve first fetching
objects from the object-relational database. Then using the behavior of the objects, values of our interest
would be obtained. These values would form the case on which the data mining algorithm would be applied.
To illustrate the above process, let us consider the issues associated with data mining in Object-Relational
Oracle8i. The SQL used in Oracle8i supports object relational concepts. We can create objects and store
them in the database. A SQL query can return objects as its results. There are two ways in which Oracle8i
stores objects, either as attributes in a column or as an object table which is a special kind of table that
essentially maps each attribute of an object to a column in the table.
In the first case when object is stored as an attribute, t he object is fetched into an object variable of the host
language. Then the object is queried using its member functions about its state. Following any data mining
algorithm can be applied to the object. For our case, we made use of the clustering algorithm. The clustering
algorithm is an expectation method that uses iterative refinement techniques to group records into

Page 5 of 7
neighborhoods (clusters) that exhibit similar, predictable characteristics. Often, these characteristics may be
hidden or non-intuitive cannot be seen through normal SQL queries.
In case the objects are stored in object tables, a different strategy can be used. Such objects have a unique id
through which they can be referred. Oracle8i provides an extension to SQL through the value operator
which returns each row object. Oracle8i also allows us to implement methods and operator overloading.
Since data mining algorithms make heavy use of comparison operators, the overloading of the order method
provided by Oracle8i for each object is of great assistance. Oracle8i also allows storing references to
objects.
In this case the reference would have to be de-referenced to get the desired object. Once we have the desired
object, we can proceed as given in the preceding paragraph.
2.3. Problems and Difficulties in the Integration
· Object-Relational Database Systems are yet to be implemented in their true form as visualized by the
father
of the technology Mr. Stonebraker. Until this is done the dream of true integration would remain unfulfilled.
· Data Mining normally involves operations on very large sets of data. Object-Relational Database Systems
suffer a performance loss when dealing with such large amounts of persistent objects.
· Independently both Data Mining and Object-Relational Database Systems are still evolving and this
continues to pose problems to bring them to a common framework.
· Some relational activists such as (C. J. Date 2002) are of the opinion that there is no real need f or Object-
Relational Databases and only the relational model should be extended to incorporate user-defined types.
These
people have a wide ranging influence in the database circles and continue to influence the research trends.
This could be a detrimental factor for the proposed integration.
· Since most existing data is stored in relational database systems, a total shift towards Object-Relational
Database Systems would take some time, thus influencing the integration efforts.
· Data Mining and Object-Relational Database model continue to a subject of research in the academia but
unless serious efforts are made to bring out the benefits of these technologies to real world businesses, no
real progress would be visible.
· Many of the data mining algorithms use complex mathematical and statistical algorithms that are not easily
mapped into human terms. We found this to be a strong deterrent for understanding.
3. Conclusion
Object-relational database systems are fast gaining popularity in the industry and are replacing the
traditional relational databases, due to their inherent support of object-oriented technologies. Data mining
techniques are currently optimized for relational database systems and must evolve to work with object-
Page 6 of 7
relational database systems. Independently both of these are technologies are becoming the prerequisites of
doing business in the new economy and their integration is the next logical step.
This research paper proposes two schemes for the integration of data mining and object-relational database
systems. These are:
· Amalgamation of Data Mining in the Database Schema itself
· Application of Data Mining process using a host language on the data contained in an Object-Relational
Database System
The first scheme proposes to represent the data mining procedure as an object in the database schema thus
preserving the semantics of the cases. The second scheme explores how data mining can be used to obtain
valuable information in existing Object-Relational databases through the host language.
The research concludes that the benefits offered by data mining to businesses and by object-relational
database systems to developing semantic data on a better conceptual grounds, justify the efforts for the
integration of these two technologies. In this regard this research puts forth the above integration framework
for further analysis and study.
Data mining techniques should start to take advantage of Object-Relational Database Systems. With the
explosion of the Internet, the electronic highway is really becoming an object-based highway. Changing
times call for changing measures and the field of data mining should also be alive to this. The integration of
data mining in Object-Relational Database Systems would go a long way in paving the foundations of a
semantic web and to make possible the vision of distributed data mining applications communicating with
each other through XML across dissimilar systems. This is what the authors’ of this paper visualize for the
coming years.

4. References
C.J. Date, 2002, An Introduction to Database Systems, 7th Edition, Addison Wesley.
Donald D. Chamberlin,2010. “Anatomy of an Object-Relational Database,” http://www.db2mag.com.
Han J. Kamber, 2000, Data Mining: Concepts and Techniques, New York: Morgan-Kaufman.
Ramakanth S. Devarakonda, 2011. “Object-Relational Database Systems - The Road Ahead”,
http://www.acm.org/crossroads.

Page 7 of 7

You might also like