0% found this document useful (0 votes)
136 views

Database Integration

This document discusses database integration and schema matching. It defines distributed, heterogeneous, and multi-databases. Database integration involves combining information from multiple autonomous databases to answer queries using the combined information. Schema matching is the process of taking two database schemas and producing a mapping between corresponding elements. It allows for merging schemas into a global schema. Schema matching considers attributes like name, description, data type and relationships to determine 1:1, 1:m or m:n matches between elements.

Uploaded by

mohsin dish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views

Database Integration

This document discusses database integration and schema matching. It defines distributed, heterogeneous, and multi-databases. Database integration involves combining information from multiple autonomous databases to answer queries using the combined information. Schema matching is the process of taking two database schemas and producing a mapping between corresponding elements. It allows for merging schemas into a global schema. Schema matching considers attributes like name, description, data type and relationships to determine 1:1, 1:m or m:n matches between elements.

Uploaded by

mohsin dish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Database Integration

Lecture 10
Instructor: Mehwashma Amir
A database system is composed of two elements:
• DBMS
• Database
 A schema describes the actual data
structures and organization within
the system.
History
• During the decade of the 1970, centralized databases were
predominant, but recent innovations in communications and
database technologies have engendered a revolution in data
processing, giving rise to a new generation of decentralized database
systems i.e. distributed database.
• A fundamental distinction must first be drawn between distributed,
heterogeneous, and multidatabase systems.
• Distributed database: A distributed database system is made up of a single
logical database that is physically distributed across a computer network,
together with a distributed database management system that answers
consistent queries and updates. Its homogenous (all its physical
components run the same distributed database management system)
• Heterogeneous database: a heterogeneous database system is a
distributed database system that includes heterogeneous components at
the database level; these may include a variety of data models, query
languages, schemas, and access heterogeneities.
• Multi database: A Multi database system is a collection of loosely coupled
element databases, with no unified schema applied for their integration
Motivation for Multi Database
• A large organization has several departments each making
autonomous decisions.
• Widespread heterogeneity arises naturally from a free market of
ideas and products, some of which prove to be more widely adapted
than others to specific applications.
SO Database integration is:
• Combining information from multiple autonomous information sources
• And answering queries using the combined information
• Database integration conceptually combines participating databases
to form a single cohesive interoperable Multi database. Such a Multi
database Is Capable of providing uniform user access interfaces to
the component heterogeneous distributed database systems.
• Multi database systems combine autonomous and heterogeneous
component (or local) database systems into a global database system

Database Integration
• The important task in the integration process is how to merge
together two different databases through the different data
models.
Three types
• System integration: enables data to be accessed from more than
one data base.
• Schema integration: provides a uniform global conceptual view of
the multi database.
• Semantic integration: resolves data conflicts which might exist
between component databases.
Schema matching
• Fundamental problem:
schema matching, which takes two (or more) database schemas to
produce a mapping between elements (or attributes) of the two
(or more) schemas that correspond semantically to each other.
• Objective: merge the schemas into a single global schema
Integrating Two Schema
• Represent the mapping with a similarity relation, , over the power
sets of S1 and S2, where each pair in represents one element of
the mapping. E.g.,

Cust.CNo Customer.CustID
Cust.CompName Customer.Company
{Cust.FirstName, Cust.LastName} Customer.Contact

Different types of matching
• Schema-level only matching: only schema information is considered.
• Domain and instance-level only matching: some instance data (data
records) and possibly the domain of each attribute are used. This
case is quite common on the Web.
• Integrated matching of schema, domain and instance data: Both
schema and instance data (possibly domain information) are
available.
Schema level matching
• Schema level matching relies on information such as name,
description, data type, relationship type
• Match cardinality:
• 1:1 match: one element in one schema matches one element of another
schema.
• 1:m match: one element in one schema matches m elements of another
schema.
• m:n match: m elements in one schema matches n elements of another
schema.

Example:
m:1 match is similar to 1:m match. m:n match is complex, and there is little work on it.

What does schema matching do


• Given 2 schemas
• Returns how each element from each schema is related (= , <= , is-a,
part-of, overlap (set), contain (set) .. etc)
• It is impossible to determine fully automatically all matches. At best,
what we can do is to infer match candidates which users can accept,
reject or change.
Issues
• When matching a large number of schemas, statistical approaches
such as data mining can be used, rather than only doing pair-wise
match.
Schema matching tools
• IBM Rational Data Architect
• Microsoft Biztalk
• COMA++
Motivation
• If Microsoft takes over Yahoo! Successfully
Tons of DB schemas will be mediated! Integration would take several
weeks or months if done manually.

You might also like