Data Integration in Data Mining
Data Integration in Data Mining
Data Integration is a data preprocessing technique that combines data from multiple sources and
provides users a unified view of these data.
Data Integration
These sources may include multiple databases, data cubes, or flat files. One of the most well-
known implementation of data integration is building an enterprise's data warehouse.
The benefit of a data warehouse enables a business to perform analyses based on the data in the
data warehouse.
In tight coupling data is combined from different sources into a single physical location through
the process of ETL - Extraction, Transformation and Loading.
2 Loose Coupling
In loose coupling data only remains in the actual source databases. In this approach, an interface
is provided that takes query from user and transforms it in a way the source database can
understand and then sends the query directly to the source databases to obtain the result.
2.
One challenge to data mining regarding performance issues is the efficiency and scalability of data
mining algorithms. Data mining algorithms must be efficient and scalable in order to effectively extract
information from large amounts of data in databases within predictable and acceptable running times.
Another challenge is the parallel, distributed, and incremental processing of data mining algorithms.
The need for parallel and distributed data mining algorithms has been brought about by the huge size of
many databases, the wide distribution of data, and the computational complexity of some data mining
methods. Due to the high cost of some data mining processes, incremental data mining algorithms
incorporate database updates without the need to mine the entire data again from scratch