0% found this document useful (0 votes)

21 views4 pages

Data Integration in Data Mining

Uploaded by

ishita.sengupta.06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Data Integration in Data Mining

Uploaded by

ishita.sengupta.06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Integration in Data Mining

Data integration is the process of merging data from several disparate sources. While performing
data integration, you must work on data redundancy, inconsistency, duplicity, etc. In data mining,
data integration is a record preprocessing method that includes merging data from a couple of the
heterogeneous data sources into coherent data to retain and provide a unified perspective of the
data. These assets could also include several record cubes, databases, or flat documents. The
statistical integration strategy is formally stated as a triple (G, S, M) approach. G represents
the global schema, S represents the heterogeneous source of schema, and M represents
the mapping between source and global schema queries.

In this article, you will learn about Data integration in data mining and discuss its methods, issues,
techniques, and tools.

What is Data Integration?

It has been an integral part of data operations because data can be obtained from several sources.
It is a strategy that integrates data from several sources to make it available to users in a single
uniform view that shows their status. There are communication sources between systems that can
include multiple databases, data cubes, or flat files. Data fusion merges data from various diverse
sources to produce meaningful results. The consolidated findings must exclude inconsistencies,
contradictions, redundancies, and inequities.

Data integration is important because it gives a uniform view of scattered data while also maintaining data
accuracy. It assists the data-mining program in meaningful mining information, which in turn assists the
executive and managers make strategic decisions for the enterprise's benefit.

The data integration methods are formally characterized as a triple (G, S, M), where;

G represents the global schema,

S represents the heterogeneous source of schema,

M represents the mapping between source and global schema queries.

Why is the Data Integration Important?

Companies that want to stay competitive and relevant welcome big data and all of its benefits and
drawbacks. One of the most common applications for data integration services and technologies is
market and consumer data collection. Data integration supports queries in these vast datasets,
benefiting from corporate intelligence and consumer data analytics to stimulate real-time
information delivery. Enterprise data integration feeds integrated data into data centers to enable
enterprise reporting, predictive analytics, and business intelligence.
Data integration is particularly important in the healthcare industry. Integrated data from various
patient records and clinics assist clinicians in identifying medical disorders and diseases by
integrating data from many systems into a single perspective of beneficial information from which
useful insights can be derived. Effective data collection and integration also improve medical
insurance claims processing accuracy and ensure that patient names and contact information are
recorded consistently and accurately. Interoperability refers to the sharing of information across
different systems.

Data Integration Approaches

There are mainly two types of approaches for data integration. These are as follows:

Tight Coupling
It is the process of using ETL (Extraction, Transformation, and Loading) to combine data
from various sources into a single physical location.

Loose Coupling
Facts with loose coupling are most effectively kept in the actual source databases. This approach
provides an interface that gets a query from the user, changes it into a format that the supply
database may understand, and then sends the query to the source databases without delay to obtain
the result.

Issues in Data Integration

When you integrate the data in Data Mining, you may face many issues. There are some of those
issues:

Entity Identification Problem

As you understand, the records are obtained from heterogeneous sources, and how can you
'match the real-world entities from the data'. For example, you were given client data from
specialized statistics sites. Customer identity is assigned to an entity from one statistics supply,
while a customer range is assigned to an entity from another statistics supply. Analyzing such
metadata statistics will prevent you from making errors during schema integration.

Structural integration is completed by guaranteeing that the functional dependency and

referential constraints of a character in the source machine match the functional dependency and
referential constraints of the identical character in the target machine. For example, assume that
the discount is applied to the entire order in one machine, but in every other machine, the
discount is applied to each item in the order. This distinction should be noted before the
information from those assets is included in the goal system.
Redundancy and Correlation Analysis
One of the major issues in the course of data integration is redundancy. Unimportant data that are
no longer required are referred to as redundant data. It may also appear due to attributes created
from the use of another property inside the information set. For example, if one truth set contains
the patronage and distinct data set as the purchaser's date of the beginning, then age may be a
redundant attribute because it can be deduced from the use of the beginning date.

Inconsistencies further increase the level of redundancy within the characteristic. The use of
correlation analysis can be used to determine redundancy. The traits are examined to determine
their interdependence on each difference, consequently discovering the link between them.

Tuple Duplication
Information integration has also handled duplicate tuples in addition to redundancy. Duplicate
tuples may also appear in the generated information if the denormalized table was utilized as a
deliverable for data integration.

Data warfare Detection and backbone

The data warfare technique of combining records from several sources is unhealthy. In the same
way, that characteristic values can vary, so can statistics units. The disparity may be related to the
fact that they are represented differently within the special data units. For example, in one-of-a-
kind towns, the price of an inn room might be expressed in a particular currency. This type of issue
is recognized and fixed during the data integration process.

Data Integration Techniques

There are various data integration techniques in data mining. Some of them are as follows:

Manual Integration
This method avoids using automation during data integration. The data analyst collects, cleans,
and integrates the data to produce meaningful information. This strategy is suitable for a mini
organization with a limited data set. Although, it will be time-consuming for the huge,
sophisticated, and recurring integration. Because the entire process must be done manually, it is a
time-consuming operation.

Middleware Integration
The middleware software is used to take data from many sources, normalize it, and store it in the
resulting data set. When an enterprise needs to integrate data from legacy systems to modern
systems, this technique is used. Middleware software acts as a translator between legacy and
advanced systems. You may take an adapter that allows two systems with different interfaces to
be connected. It is only applicable to certain systems.

Application-based integration
It is using software applications to extract, transform, and load data from disparate sources. This
strategy saves time and effort, but it is a little more complicated because building such an
application necessitates technical understanding. This strategy saves time and effort, but it is a
little more complicated because building such an application necessitates technical understanding.

Uniform Access Integration

This method combines data from a more disparate source. However, the data's position is not
altered in this scenario; the data stays in its original location. This technique merely generates a
unified view of the integrated data. The integrated data does not need to be stored separately
because the end-user only sees the integrated view.

Data Warehousing
This technique is related to the uniform access integration technique in a roundabout way. The
unified view, on the other hand, is stored in a different location. It enables the data analyst to deal
with more sophisticated inquiries. Although it is a promising solution and increased storage costs,
the unified data's view or copy requires separate storage and maintenance costs.

Integration tools
There are various integration tools in data mining. Some of them are as follows:

On-promise data integration tool

An on premise data integration tool integrates data from local sources and connects legacy
databases using middleware software.

Open-source data integration tool

If you want to avoid pricey enterprise solutions, an open-source data integration tool is the ideal
alternative. Although, you will be responsible for the security and privacy of the data if you're
using the tool.

Cloud-based data integration tool

A cloud-based data integration tool may provide an 'integration platform as a service'.

Pillars of Singing - Ebook - PopUp
50% (6)
Pillars of Singing - Ebook - PopUp
121 pages
Residential Tenancy Agreement
No ratings yet
Residential Tenancy Agreement
11 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Data Integration
No ratings yet
Data Integration
18 pages
Data Integration
No ratings yet
Data Integration
10 pages
unit5
No ratings yet
unit5
20 pages
DATA WAREHOUSING UNIT 1[1]
No ratings yet
DATA WAREHOUSING UNIT 1[1]
26 pages
05 DS Data Preprocessing - Cleaning
No ratings yet
05 DS Data Preprocessing - Cleaning
14 pages
A Roadmap To Enterprise Data Integration
No ratings yet
A Roadmap To Enterprise Data Integration
32 pages
Data Integration and Data Reduction
No ratings yet
Data Integration and Data Reduction
27 pages
Unit-2 new
No ratings yet
Unit-2 new
61 pages
Datamining 180303060331
No ratings yet
Datamining 180303060331
12 pages
06 Data Integration Presentation
No ratings yet
06 Data Integration Presentation
20 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
DS Module2 L5 L15
No ratings yet
DS Module2 L5 L15
40 pages
UNIT 2 Notes - Data Science
No ratings yet
UNIT 2 Notes - Data Science
18 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Reading Material Mod 4 Data Integration - Data Warehouse
No ratings yet
Reading Material Mod 4 Data Integration - Data Warehouse
33 pages
Data Warehousing and Business Intelligence
No ratings yet
Data Warehousing and Business Intelligence
15 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
UNIT-2 LONG ANSWER TYPE QUESTIONS
No ratings yet
UNIT-2 LONG ANSWER TYPE QUESTIONS
15 pages
Beauty and The Beast: The Theory and Practice of Information Integration
No ratings yet
Beauty and The Beast: The Theory and Practice of Information Integration
17 pages
Telesperience Data Integration
No ratings yet
Telesperience Data Integration
13 pages
UNIT-III Data Warehouse and Minig Notes MDU
No ratings yet
UNIT-III Data Warehouse and Minig Notes MDU
42 pages
Basics of Data Integration
No ratings yet
Basics of Data Integration
67 pages
Data Integration
No ratings yet
Data Integration
6 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data Integration
No ratings yet
Data Integration
8 pages
OLAP and Metadata
No ratings yet
OLAP and Metadata
6 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bafpred Module 2 Week 5 6
No ratings yet
Bafpred Module 2 Week 5 6
35 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
Data Warehouse and Data Mining- Definition and Concepts
No ratings yet
Data Warehouse and Data Mining- Definition and Concepts
20 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
BI Architecture
No ratings yet
BI Architecture
4 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595
No ratings yet
Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595
23 pages
, Business Analytics Olaa
No ratings yet
, Business Analytics Olaa
34 pages
Acceptance_Testing_and_ETL_Process_j8Mus6Ctvj
No ratings yet
Acceptance_Testing_and_ETL_Process_j8Mus6Ctvj
19 pages
BECE352E Module 2
No ratings yet
BECE352E Module 2
58 pages
Data Analytics
No ratings yet
Data Analytics
8 pages
unit 2 Preprocessing in Data Mining
No ratings yet
unit 2 Preprocessing in Data Mining
6 pages
Data Mining and Data Warehousing Notes ct1
No ratings yet
Data Mining and Data Warehousing Notes ct1
12 pages
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
MODULE 2 LESSON 1
No ratings yet
MODULE 2 LESSON 1
12 pages
Integrated Technology Object Oriented Model
No ratings yet
Integrated Technology Object Oriented Model
11 pages
DATA MINING Notes
No ratings yet
DATA MINING Notes
37 pages
A) Data Cleaning
No ratings yet
A) Data Cleaning
7 pages
The Process of Data Mapping For Data Integration Projects
No ratings yet
The Process of Data Mapping For Data Integration Projects
6 pages
Data mining 3
No ratings yet
Data mining 3
31 pages
Data Preprocessing
No ratings yet
Data Preprocessing
2 pages
Data Mining MCA 3 Sem
No ratings yet
Data Mining MCA 3 Sem
51 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Data Preprocessing Steps 2
No ratings yet
Data Preprocessing Steps 2
26 pages
1,2 UNITS NOTES
No ratings yet
1,2 UNITS NOTES
53 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Parent 1998 Issues and Approaches of Database Integration
No ratings yet
Parent 1998 Issues and Approaches of Database Integration
12 pages
The Ultimate Guide: To Data Integration
No ratings yet
The Ultimate Guide: To Data Integration
14 pages
Studentdata 1
No ratings yet
Studentdata 1
89 pages
Hierarchical Clustering in Data Mining
No ratings yet
Hierarchical Clustering in Data Mining
4 pages
Syllabus OEC-CS801E
No ratings yet
Syllabus OEC-CS801E
3 pages
Regression
No ratings yet
Regression
4 pages
Partitioning Method
No ratings yet
Partitioning Method
8 pages
Goal Setting
No ratings yet
Goal Setting
15 pages
ERP CRM SCM SRC 16march2024
No ratings yet
ERP CRM SCM SRC 16march2024
20 pages
Four Cs ECommerce 28may2024 SRC
No ratings yet
Four Cs ECommerce 28may2024 SRC
6 pages
Firewall SH 16march2024 SRC
No ratings yet
Firewall SH 16march2024 SRC
7 pages
OEC-CS802A - E-Commerce - Theory Assignment - Even 2024
No ratings yet
OEC-CS802A - E-Commerce - Theory Assignment - Even 2024
8 pages
Final Exam: DISC 333 (Part 2)
100% (1)
Final Exam: DISC 333 (Part 2)
11 pages
Topo Surveying CBET - Course Outline Modular
No ratings yet
Topo Surveying CBET - Course Outline Modular
3 pages
Acknowledgement: Experimental Study of Performance of Conical Solar Still Using Nano Fluid
No ratings yet
Acknowledgement: Experimental Study of Performance of Conical Solar Still Using Nano Fluid
5 pages
NQA ISO 22000 Implementation Guide
100% (1)
NQA ISO 22000 Implementation Guide
20 pages
Water Compatibility
No ratings yet
Water Compatibility
77 pages
headphones_brochure_EN
No ratings yet
headphones_brochure_EN
6 pages
Appendix C Design Examples
No ratings yet
Appendix C Design Examples
2 pages
Feedback in The Clinical Setting: Review Open Access
No ratings yet
Feedback in The Clinical Setting: Review Open Access
5 pages
Chitin
No ratings yet
Chitin
6 pages
Manual Nectrack Lite
No ratings yet
Manual Nectrack Lite
12 pages
Newsletter 042013
No ratings yet
Newsletter 042013
8 pages
Registration Form
No ratings yet
Registration Form
3 pages
Numbering Systems
No ratings yet
Numbering Systems
43 pages
The Atmosphere: Ma. Salve T. Antaran
No ratings yet
The Atmosphere: Ma. Salve T. Antaran
84 pages
Princess Bride Script
No ratings yet
Princess Bride Script
119 pages
q2 Las 1 Ste Ict Edited
No ratings yet
q2 Las 1 Ste Ict Edited
16 pages
4 Frame Structures
No ratings yet
4 Frame Structures
148 pages
SR320 THRU SR3200: Schottky Barrier Rectifier
No ratings yet
SR320 THRU SR3200: Schottky Barrier Rectifier
2 pages
Geomembrane Materials For Potable Water Applications: AWWA Standard
No ratings yet
Geomembrane Materials For Potable Water Applications: AWWA Standard
24 pages
All Clear TRF1 SEC - B Vocab & Grammar
No ratings yet
All Clear TRF1 SEC - B Vocab & Grammar
44 pages
Síntesis de Calix (4) Pirrol
100% (1)
Síntesis de Calix (4) Pirrol
5 pages
06 Foster Adsb
100% (1)
06 Foster Adsb
13 pages
01 Environmnet Gs Notes
No ratings yet
01 Environmnet Gs Notes
20 pages
Components of Network
No ratings yet
Components of Network
2 pages
Letourneau-Cara-Goudreau_evaluation MHSI
No ratings yet
Letourneau-Cara-Goudreau_evaluation MHSI
14 pages
FM Transmitter Project Chapter 1
No ratings yet
FM Transmitter Project Chapter 1
4 pages
Animation Training: Teacher Mr. Vinod
No ratings yet
Animation Training: Teacher Mr. Vinod
4 pages
World War 2-24-25 Answer Key
No ratings yet
World War 2-24-25 Answer Key
4 pages

Data Integration in Data Mining

Uploaded by

Data Integration in Data Mining

Uploaded by

Data Integration in Data Mining

What is Data Integration?

G represents the global schema,

S represents the heterogeneous source of schema,

M represents the mapping between source and global schema queries.

Why is the Data Integration Important?

Data Integration Approaches

Issues in Data Integration

Entity Identification Problem

Structural integration is completed by guaranteeing that the functional dependency and

Data warfare Detection and backbone

Data Integration Techniques

Uniform Access Integration

On-promise data integration tool

Open-source data integration tool

Cloud-based data integration tool

You might also like