0% found this document useful (0 votes)
8 views4 pages

Topic 7

Data integration is the process of combining heterogeneous data into a unified structure, allowing access to various data sources for personal or business use. It involves multiple levels of integration, from basic database requests to comprehensive systems that manage integrated data globally. Data quality, which assesses the fitness of data for its intended purpose, is crucial and can be improved through best practices, while challenges include understanding data needs, timing, and governance.

Uploaded by

brianalbert262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Topic 7

Data integration is the process of combining heterogeneous data into a unified structure, allowing access to various data sources for personal or business use. It involves multiple levels of integration, from basic database requests to comprehensive systems that manage integrated data globally. Data quality, which assesses the fitness of data for its intended purpose, is crucial and can be improved through best practices, while challenges include understanding data needs, timing, and governance.

Uploaded by

brianalbert262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Topic 7: Data Integration and Quality

Data Integration

It is a process in which heterogeneous data is retrieved and combined as an incorporated form and
structure.

It is the process of unifying the underlined data sets.

It implies the ability to access any source of data within or outside your enterprise.

It allows different data types(such as data sets ,documents and tables)to be merged by users,
organization and applications for use as personal or business processes and/or functions.

An example of data integration in a smaller paradigm is spreadsheet integration in a Microsoft


word document.

How Data Integration Works

1. The client sends a request for integrated data to a master web server.
2. The web server sends requests for individual data sets to the appropriate databases.
3. The databases respond by sending the requested data back to the master server.
4. The master server combines the data into a unified view.
5. The master server sends the unified view to the client.

Different Levels of Integration


Lowest Level
The goal is to enable one DBMS to request and obtain data from another DBMS in a typical
client/server mode. Gateways i.e. dedicated packages support this limited functionality.
Most well known gateways are ODBC(Open Database Connectivity) is an SQL based
emerging standard from Microsoft.

Intermediate Level
The goal is to support user-driven access and/or integration of data from multiple
databases.
The term User-driven refers to the fact that users are given the possibility to simultaneously
manipulate data from several sources in some uniform way.
To implement such a frame work a software layer is developed whose functionality range
from:
A multi-database query language i.e.OEM-QL,MySQL.
Providing a single SQL-like syntax that is understood by a set of translators.

Higher Level

The goal is to develop a global system sitting on top of the existing system to provide the
desired level of integration of the data sources.
All existing data are integrated into a logically unique database and managed in consistence
way under single global control authority.

Steps of Integrating Data


Pre-integration-Input Schemas are re-arranged in various ways to make them homogenous
(syntactically and semantically)
Correspondence Identification-The step is devoted to the identification of related items in
the input schemas and the precise description of those inter-schemas relationships.
Integration-It unifies corresponding items into an integrated schema and produces the
associated mappings.

Data Quality- It is a perception or an assessment of data’s fitness to serve its purpose in a given
context.

Data Quality Assurance- The process of verifying the reliability and effectiveness of data.

Database quality depends on:

The quality of a conceptual data model.

Quality of stored data.


Quality of processes on data.

Quality of data model:

Data completeness- Concerns the degree in which all data relevant to an application domain has
been recorded in an information system.

Data Uniqueness- States that two or more values do not conflict each other.

Data Consistency- Expresses the degree to which a set of data satisfies a set of integrity constraints.

Best Practices in Data Quality Improvement

1. Establish a data quality environment.


2. Assess data definitions.
3. Collection of quality facts.
4. Identification of issues.
5. Assess impact.
6. Investigate causes-It has two basic approaches: Error Cluster Analysis-Here the
information contained in the database is used to provide clues to where the inaccuracies
may be coming from. Data Event Analysis- Studies the events where data is created and
changed in order to help identify the root causes of problems.
7. Propose and implement remedies.
8. Monitor for results.

Challenges of Data Integration

Understand data needs- It is about delivering the right data to the right application in order to
achieve the right business result.

Understand Business Timing Needs-. It is IT's almost sacred duty to deliver the data where it is
needed when it is needed.
Everything has a date and time stamp- Many old legacy systems did not record on the data
record a time stamp for the activity. Sometimes systems are unable to identify what has changed.
The data integration solution must time-stamp the record in this case.
Externally Provided Data is Always Suspect - We have an increasing reliance on data from
external sources, particularly within the data warehouse. Yet at the same time it should be tested
to ensure it adds value

Integrate Master Data and Governance Rules- Where MDM solutions have been implemented
then the MDM becomes the centre of the hub for particular types of data. e.g. all customer data
must be validated against the customer master. In this case customer data must be validated by the
customer master before being forwarded to other systems that require the data.

Retain a history- There is increasing statutory pressure to maintain a history of all historical
transactions. an effective data integration solution should work with archiving solutions to make
this happen.
Evolve, Evolve, Evolve-The data integration hub will need to evolve to meet the new corporate
goals. The data integration team will be at the heart of any project that manages data. They will
need to know about all new data used within the organization.

You might also like