1.6 - Data Integration, 1.10 - Transformation

1. Data integration is a technique to merge data from multiple sources, such as flat files and multi-dimensional databases, into a coherent data store like a data warehouse. 2. Issues that can arise during data integration include schema integration where different sources use the same attributes differently, redundancy where some attributes provide redundant information, and resolving conflicts in data values represented differently across sources. 3. Data transformation techniques prepare data for mining and include smoothing to remove noise, aggregation to summarize data, generalization to replace low-level concepts, and normalization to scale attribute values into a standard range.

Uploaded by

dssd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views3 pages

1.6 - Data Integration, 1.10 - Transformation

Uploaded by

dssd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

1.

6 DATA INTEGRATION :
It is a technique to merge data from multiple sources into a coherent data store such as a
data warehouse.
It is a preprocessing method that involves merging of data files from different sources like flat
files, multi – dimensional database, data cubes…etc in order to form a data store like data
warehouse.
Assume that in our data warehouse, data from two different companies named A and B is
provided.
Ex.: Company A : Company B:
Emp.no Name DO Age Price
B
1 John ….. 28 ₹10
0 Emp. ID Name Price
2 Rasul ….. 32 ₹50 4 Siva $100
0 5 Ram $500

While maintaining and integrating the data in warehouse, the following issues may be arisen
such as,
- Schema Integration and Object matching
Both companies are using with the same property (datatype) like numerical value for
the different attribute names as Emp. No & Emp.ID. During Integration (automation),
system could not distinguish them so that it may accept all the numbers at one place
- Redundancy ( unwanted attributes)
In company A table, two attributes like DOB and Age as both are not required since
they implicit one another. It means that if we know DOB, Age can be calculated and
vice – versa. So, one of them is to be simply ignored or removed.
- Detection and resolution of data value objects
For this case, suppose Company A is representing Price in Rupees and Company B is
doing in dollars. At this point of time, simple replacement or substitution is not sufficient
as it may give conflicts…..i.e., $100 and ₹100.
That’s the reason, not only detecting the mistake, but also to be resolved by correctly
modifying the values.
https://www.youtube.com/watch?v=UKUq7hZdZUw (Data Integration)
1.10 DATA TRANSFORMATION :
It is a data preprocessing technique which transforms or consolidates the data into alternate
forms appropriate for mining.
Here, the following four steps are involved, namely,
a) Smoothing – Removing the noise from data
( similar to Binning, Regression, Clustering…)
b) Aggregation – using summary or aggregate function, data cube ( multi – dimensional
database) can be constructed. This process is much helpful in OLAP ( On Line
Analytical Processing ) operations.
c) Generalization – here, low level concepts are replaced with higher level concepts.
Ex.: in some databases, street is going to be replaced simply by city / country.
d) Normalization - here, attribute values are normalized by scaling their values so that
they fall in specified range.
Ex.: suppose there are a set of values like 2, 100, 1, 500, 35, 900…. Then these must
be scaled in such a way that the specified range is to be selected ….like may be 0 to 1
in which all must fall.
This normalization process can be done in 2 ways, such that
 Min / Max Normalization – in this method, the new value of an attribute can be
found by using the formula as
v′ = (v – minx ) / ( maxx – minx ) , where
v′ is new value
v is the actual / original attribute value
minx and maxx are the minimum and maximum values of a given set of elements.
In the above example, for first attribute ---- v = 2, minx = 1 and
maxx = 900…..so on
 Z – score Normalization or Zero mean Normalization
Here also the following formula is to be adopted as
v′ = (v – x′) / σx ) , where
v′ is new value
v is the actual / original attribute value
x′ is Mean of attribute
σx is standard deviation of attribute
https://www.youtube.com/watch?v=RQ0I1u-q8N8 ( Data Tranformation)

ISTQB® Software Testing Foundation: Course Book
100% (1)
ISTQB® Software Testing Foundation: Course Book
540 pages
ICS 2408 - Lecture 2 - Data Preprocessing
No ratings yet
ICS 2408 - Lecture 2 - Data Preprocessing
29 pages
Data Transformation and Standardization
No ratings yet
Data Transformation and Standardization
5 pages
Register Organization of 8086 PDF
100% (1)
Register Organization of 8086 PDF
10 pages
Data Science - Module 1.3
No ratings yet
Data Science - Module 1.3
34 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
Data Mining: Concepts and Techniques: - Chapter 3
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 3
52 pages
PNMTj-5000s Operation
100% (1)
PNMTj-5000s Operation
83 pages
Data Preprocessing
100% (1)
Data Preprocessing
33 pages
Data Preprocessing
No ratings yet
Data Preprocessing
13 pages
Notes - Unit01 - Data Science and Big Data Analytics
No ratings yet
Notes - Unit01 - Data Science and Big Data Analytics
7 pages
Data Integration & Transformation
No ratings yet
Data Integration & Transformation
14 pages
Unit 2 Preprocessing in Data Mining
No ratings yet
Unit 2 Preprocessing in Data Mining
6 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
Datamining 180303060331
No ratings yet
Datamining 180303060331
12 pages
Study+Material+Unit 4+Data+Preprocessing+
No ratings yet
Study+Material+Unit 4+Data+Preprocessing+
8 pages
Integration and Normalization
No ratings yet
Integration and Normalization
19 pages
CH2 Data Integration - Transformation
No ratings yet
CH2 Data Integration - Transformation
16 pages
FDS CH 3
No ratings yet
FDS CH 3
2 pages
Data Minig Lab Manual
No ratings yet
Data Minig Lab Manual
58 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
CS411 Final Term MCQs Merged by Masters
No ratings yet
CS411 Final Term MCQs Merged by Masters
357 pages
Unit-Ii Data Preprocessing
No ratings yet
Unit-Ii Data Preprocessing
94 pages
Overview DIP5K/EN OS/A22 DIP 5000
No ratings yet
Overview DIP5K/EN OS/A22 DIP 5000
8 pages
3datapreprocessing ppt3
No ratings yet
3datapreprocessing ppt3
46 pages
Mod1 DM Part2
No ratings yet
Mod1 DM Part2
34 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Bafpred Module 2 Week 5 6
No ratings yet
Bafpred Module 2 Week 5 6
35 pages
Chapter-3 Data Processing
No ratings yet
Chapter-3 Data Processing
54 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
3.data Pre-Processing Concepts
No ratings yet
3.data Pre-Processing Concepts
8 pages
Module 2 - Data Preprocessing
No ratings yet
Module 2 - Data Preprocessing
16 pages
Data Preprocessing Unit 2
No ratings yet
Data Preprocessing Unit 2
3 pages
7.data Preprocessing
No ratings yet
7.data Preprocessing
12 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
Data Transformation in Data Mining
No ratings yet
Data Transformation in Data Mining
6 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Preprocessing
No ratings yet
Preprocessing
62 pages
Data Preprocessing
No ratings yet
Data Preprocessing
48 pages
Data Mining: Concepts and Techniques: September 16, 2020 1
No ratings yet
Data Mining: Concepts and Techniques: September 16, 2020 1
46 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
Normalization
No ratings yet
Normalization
35 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
CH 3
No ratings yet
CH 3
68 pages
WINSEM2023-24 - BECE352E - ETH - VL2023240504409 - 2024-02-03 - Reference-Material-I 2
No ratings yet
WINSEM2023-24 - BECE352E - ETH - VL2023240504409 - 2024-02-03 - Reference-Material-I 2
16 pages
JAVA Advanced 3
No ratings yet
JAVA Advanced 3
19 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Data Warehousing Unit 1
No ratings yet
Data Warehousing Unit 1
26 pages
Screenshot 2025-04-09 at 10.35.12 AM
No ratings yet
Screenshot 2025-04-09 at 10.35.12 AM
31 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
Printable Cyber Crime Law India
No ratings yet
Printable Cyber Crime Law India
185 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
9 pages
Unit-3 Finalized
No ratings yet
Unit-3 Finalized
9 pages
Unit - 2
No ratings yet
Unit - 2
17 pages
Data Warehouse and Data Mining - Definition and Concepts
No ratings yet
Data Warehouse and Data Mining - Definition and Concepts
20 pages
Module 2 - DM - AI
No ratings yet
Module 2 - DM - AI
61 pages
OJCST Vol13 N2-3 P 78-81
No ratings yet
OJCST Vol13 N2-3 P 78-81
4 pages
Module1.5 Preprocessing
No ratings yet
Module1.5 Preprocessing
40 pages
Unit-2 Data Warehouse Notes
No ratings yet
Unit-2 Data Warehouse Notes
11 pages
Data Handling and Visualization 3rd Unit
No ratings yet
Data Handling and Visualization 3rd Unit
4 pages
Data Cleaning: Missing Values: - For Example in Attribute Income If
No ratings yet
Data Cleaning: Missing Values: - For Example in Attribute Income If
30 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
Data Proprocesing
No ratings yet
Data Proprocesing
18 pages
XSL (Extensible Stylesheet Language)
100% (1)
XSL (Extensible Stylesheet Language)
11 pages
Lecture 09 DM
No ratings yet
Lecture 09 DM
14 pages
Ads R2022
No ratings yet
Ads R2022
178 pages
License Guide Bak
No ratings yet
License Guide Bak
27 pages
Tally ERP History
No ratings yet
Tally ERP History
9 pages
Adafruit Ultimate Gps PDF
No ratings yet
Adafruit Ultimate Gps PDF
52 pages
Analysis of Algorithm Chapter 1
No ratings yet
Analysis of Algorithm Chapter 1
35 pages
Exactive Series Manbre en
No ratings yet
Exactive Series Manbre en
258 pages
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
No ratings yet
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
8 pages
Parameters in RMI
No ratings yet
Parameters in RMI
3 pages
Mobile Ad Hoc Networks A General Perspective: Srivas N. Chennu Vishwas N. 8 Semester CSE Rvce
No ratings yet
Mobile Ad Hoc Networks A General Perspective: Srivas N. Chennu Vishwas N. 8 Semester CSE Rvce
18 pages
AMDP Method
No ratings yet
AMDP Method
2 pages
Database Security
No ratings yet
Database Security
22 pages
Artificial Intelligence - AL3391 2021 Regulation - Question Paper 2023 Nov Dec
No ratings yet
Artificial Intelligence - AL3391 2021 Regulation - Question Paper 2023 Nov Dec
4 pages
CH 3
No ratings yet
CH 3
26 pages
(English (Auto-Generated) ) How To Scrape Leads From EVERY Social Media Platform (2025) (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) How To Scrape Leads From EVERY Social Media Platform (2025) (DownSub - Com)
32 pages
Framework Cheat Sheet
No ratings yet
Framework Cheat Sheet
2 pages
8793 IFSEC Global Periodic Table 2020
No ratings yet
8793 IFSEC Global Periodic Table 2020
1 page
2 1-2 4+quiz+SOLUTIONS
No ratings yet
2 1-2 4+quiz+SOLUTIONS
5 pages
Advidia Catalogue
No ratings yet
Advidia Catalogue
7 pages
SAP NetWeaver AS ABAP 7.50 SP2 - Developer Edition On SAP ASE
No ratings yet
SAP NetWeaver AS ABAP 7.50 SP2 - Developer Edition On SAP ASE
11 pages
Paper - 1 - Group Data Sharing Agreement Using Block Design Based Key in Cloud Computing
No ratings yet
Paper - 1 - Group Data Sharing Agreement Using Block Design Based Key in Cloud Computing
5 pages
Digits in Numbers: E-OLYMP 1. Simple Problem
No ratings yet
Digits in Numbers: E-OLYMP 1. Simple Problem
2 pages
OCS752 QB Model
No ratings yet
OCS752 QB Model
2 pages
First Floor Model-2
No ratings yet
First Floor Model-2
1 page
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

1.6 - Data Integration, 1.10 - Transformation

Uploaded by

1.6 - Data Integration, 1.10 - Transformation

Uploaded by

1.

You might also like