0% found this document useful (0 votes)
53 views

Explain Multirelational Data Mining Concept in Detail

Multirelational data mining aims to discover patterns involving multiple tables from a relational database. It includes tasks like multirelational classification, clustering, and frequent pattern mining. Spatial data mining applies data mining techniques to spatial data models and uses geographical information to produce business intelligence. It involves techniques to transform geographic data into useful formats and extract non-trivial patterns rather than just visualizing data. An example is using density-based clustering on fatal car accident data to detect systemic safety issues near road segments with dense accident clusters.

Uploaded by

anirudh devaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Explain Multirelational Data Mining Concept in Detail

Multirelational data mining aims to discover patterns involving multiple tables from a relational database. It includes tasks like multirelational classification, clustering, and frequent pattern mining. Spatial data mining applies data mining techniques to spatial data models and uses geographical information to produce business intelligence. It involves techniques to transform geographic data into useful formats and extract non-trivial patterns rather than just visualizing data. An example is using density-based clustering on fatal car accident data to detect systemic safety issues near road segments with dense accident clusters.

Uploaded by

anirudh devaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

JSS SCIENCE AND TECHNOLOGY UNIVERSITY

MYSURU-570006
Department of Information Science and Engineering

Advanced Data Mining Techniques Assignment

Submitted by:
Anirudh D (01JST19PSE001)
1. Explain Multirelational Data mining concept in detail.

Multirelational data mining search for patterns that involve multiple tables (relations) from a
relational database. Multirelational data mining aims to discover knowledge directly from
relational data. There are different multirelational data mining tasks, including multirelational
classification, clustering, and frequent pattern mining. Multirelational classification aims to
build a classification model that utilizes information in different relations. Multirelational
clustering aims to group tuples into clusters using their own attributes as well as tuples
related to them in different relations. Multirelational frequent pattern mining aims at finding
patterns involving interconnected items in different relations.

Relational databases are the most popular repository for structured data. In a relational
database, multiple relations are linked together via entity-relationship links. Many
classification approaches (such as neural networks and support vector machines) can only be
applied to data represented in single table. While most existing data mining approaches look
for patterns in a single data table, multi-relational data mining approaches look for patterns
that involve multiple tables (relations) from a relational database. A relational database
consists of a collection of named tables, often referred to as relations that individually behave
as the single table that is the subject of Propositional Data Mining.

Fig 1: Multirelational framework


Approaches that are supported by the Multi Relational Data Mining:

 Inductive Logic Programming (ILP)


 Multi-relational Clustering
 Probabilistic Relational Model

MRDM is a multi-disciplinary field which dealing with the Knowledge discovery from
relational database which consisting of number of relations. It is frameworks which deals
with gathering the data about the data (metadata) from a database and choose the best
approach to get the optimal results. MRDM aims to integrate the results from existing fields
like ILP, KDD, Statistics, Machine learning. Data understanding means gathering the
metadata from the database and which describe the best approach of the analysis.

Ex: Consider the relational database. Arrows go from primary keys to corresponding foreign
keys. Suppose the target relation is Loan. Each target tuple is either positive or negative,
indicating whether the loan is paid on time. The task of multirelational classification is to
build a hypothesis to distinguish positive and negative target tuples, using information in
different relations. For classification, in general, we search for hypotheses that help
distinguish positive and negative target tuples. The most popular form of hypotheses for
multirelational classification is sets of rules and data can be mined.
In a database for multirelational data mining, there is one target relation, whose tuples are
called target tuples and are associated with class labels. The other relations are nontarget
relations. Each relation may have one primary key (which uniquely identifies tuples in the
relation) and several foreign keys (where a primary key in one relation can be linked to the
foreign key in another). If we assume a two-class problem, then we pick one class as the
positive class and the other as the negative class. MRD approaches within actual database
management systems and using the query opti-mization techniques of the DBMSs to improve
the efficiency.

2. Illustrate Multidimensional Analysis and Descriptive Mining of


Complex Data Objects

A major limitation of many commercial data warehouse and OLAP tools for
multidimensional database analysis is their restriction on the allowable data types for
dimensions and measures. Most data cube implementations confine dimensions to
nonnumeric data and measures to simple aggregated values. To introduce data mining and
multidimensional data analysis for complex objects, we examine how to perform
generalization on complex structured objects and construct object cubes for OLAP and
mining in object databases. The storage and access of complex structured data have been
studied in object-relational and object-oriented database systems. These systems organize a
large set of complex data objects into classes, which are in turn organized into class/subclass
hierarchies. Each object in a class is associated with

1) An object-identifier

2) A set of attributes that may contain sophisticated data structures, set- or list-valued data,
class composition and hierarchies, multimedia data

3) A set of methods that specify the computational routines or rules associated with the object
class.

To facilitate generalization and induction in object-relational and object-oriented databases,


it is important to know how the generalized data can be used for multidimensional data and
analysis and data mining. The application of descriptive analysis is to discover the captivating
subgroups in the major part of the data.

Thus, the extraction of meaningful feature representations yields a variety of different views
on the same set of data objects. Each of these views or representations might focus on a
different aspect and may offer another notion of similarity. However, in almost any
application there is no universal feature representation that can be used to express similarity
between all possible objects in a meaningful way. Thus, recent data mining approaches
employ multiple representations to achieve more general results that are based on a variety of
aspects.

The descriptive mining is used to mine data and provide the latest information on past or
recent events. It identifies, what happened in the past by analysing stored data and provides
accurate data. Some Practical analysis methods here are Standard reporting, query/drill down
and ad-hoc reporting. Descriptive mining focuses on the summarization and conversion of the
data into meaningful information for reporting and monitoring.

An example application for multi-represented objects is data mining in protein data. A protein
can be described by multiple feature transformations based upon its amino acid sequence, its
secondary or its three-dimensional structure. Another example is data mining in image data
which might be represented by texture features, color histograms or text annotations. Mining
multi-represented objects yields advantages because more information can be incorporated
into the mining process. On the other hand, the additional information has to be used
carefully since too much information might distort the derived patterns. Basically, we can
distinguish two problems when clustering multirepresented objects, comparability and
semantics. The comparability problem subsumes several effects when comparing features,
distances or statements from different representations.

3. Paraphrase on spatial data mining using an example.

Spatial data mining is the application of data mining to spatial models. In spatial data mining,
analysts use geographical or spatial information to produce business intelligence or other
results. This requires specific techniques and resources to get the geographical data into
relevant and useful formats. It is generally used to talk about finding useful and non-trivial
patterns in data. In other words, just setting up a visual map of geographic data may not be
considered spatial data mining by experts. The core goal of a spatial data mining project is to
distinguish the information in order to build real, actionable patterns to present, excluding
things like statistical coincidence, randomized spatial modelling or irrelevant results. SDM
aims to improve human ability to extract knowledge and insights from large and complex
collections of digital data. It efficiently extracts previously unknown, potentially useful, and
ultimately understandable knowledge from these huge datasets for a given task. the SDM
method not only relies on the traditional theories of mathematical statistics, machine learning,
pattern recognition, neural networks, and artificial intelligence, but it also engages new
methods, such as data fields, cloud models, and decision trees.

Fig 2. Spatial data mining process

Ex: Consider an example of Exploring Fatal Car Accident Data by using Spatial data mining
process. Here Finding and prioritizing locations where systemic issues result in multiple fatal
car accidents is a crucial need for transportation agencies that run operations and guide safety
policy. A spatial approach can help us expand beyond our basic understanding of where fatal
car accidents occur and start detecting patterns.

Many fatal car accidents result from seemingly random events, but a dense cluster of fatal car
accidents near a specific road segment can suggest the presence of a systemic problem or
human-driven process that can greatly benefit from targeted safety measures. This
methodology uses Spatial Statistics tool named Density-Based Clustering to find dense
clusters of fatal car accidents.

Prioritize Clusters
Find Cluster (Density
Categorize Clusters (Normalizing and
Based Clustering)
Indexing)

To organize and prioritize each cluster as a candidate safety measure project, we then
consider the characteristics of each detected cluster and the transportation network:

 The elements of the transportation network at the cluster’s location, such as the
presence of an intersection or the posted speed limit, become basis for classifying the
clusters into groups that can be addressed by common safety measures. An example
group would be “Intersection and Traffic Light Clusters”.
 The number of fatal accidents at the cluster and the traffic counts at the location then
become the basis of prioritization rankings.
 These clusters became our candidate priority locations, and we then supplement
additional characteristics for each cluster to categorize and prioritize.

You might also like