Topic 7
Topic 7
Data Integration
It is a process in which heterogeneous data is retrieved and combined as an incorporated form and
structure.
It implies the ability to access any source of data within or outside your enterprise.
It allows different data types(such as data sets ,documents and tables)to be merged by users,
organization and applications for use as personal or business processes and/or functions.
1. The client sends a request for integrated data to a master web server.
2. The web server sends requests for individual data sets to the appropriate databases.
3. The databases respond by sending the requested data back to the master server.
4. The master server combines the data into a unified view.
5. The master server sends the unified view to the client.
Intermediate Level
The goal is to support user-driven access and/or integration of data from multiple
databases.
The term User-driven refers to the fact that users are given the possibility to simultaneously
manipulate data from several sources in some uniform way.
To implement such a frame work a software layer is developed whose functionality range
from:
A multi-database query language i.e.OEM-QL,MySQL.
Providing a single SQL-like syntax that is understood by a set of translators.
Higher Level
The goal is to develop a global system sitting on top of the existing system to provide the
desired level of integration of the data sources.
All existing data are integrated into a logically unique database and managed in consistence
way under single global control authority.
Data Quality- It is a perception or an assessment of data’s fitness to serve its purpose in a given
context.
Data Quality Assurance- The process of verifying the reliability and effectiveness of data.
Data completeness- Concerns the degree in which all data relevant to an application domain has
been recorded in an information system.
Data Uniqueness- States that two or more values do not conflict each other.
Data Consistency- Expresses the degree to which a set of data satisfies a set of integrity constraints.
Understand data needs- It is about delivering the right data to the right application in order to
achieve the right business result.
Understand Business Timing Needs-. It is IT's almost sacred duty to deliver the data where it is
needed when it is needed.
Everything has a date and time stamp- Many old legacy systems did not record on the data
record a time stamp for the activity. Sometimes systems are unable to identify what has changed.
The data integration solution must time-stamp the record in this case.
Externally Provided Data is Always Suspect - We have an increasing reliance on data from
external sources, particularly within the data warehouse. Yet at the same time it should be tested
to ensure it adds value
Integrate Master Data and Governance Rules- Where MDM solutions have been implemented
then the MDM becomes the centre of the hub for particular types of data. e.g. all customer data
must be validated against the customer master. In this case customer data must be validated by the
customer master before being forwarded to other systems that require the data.
Retain a history- There is increasing statutory pressure to maintain a history of all historical
transactions. an effective data integration solution should work with archiving solutions to make
this happen.
Evolve, Evolve, Evolve-The data integration hub will need to evolve to meet the new corporate
goals. The data integration team will be at the heart of any project that manages data. They will
need to know about all new data used within the organization.