2 Data Preprocessing
2 Data Preprocessing
Dec 2024
Outline
Data a sets of
objects/samples/vectors/insta
nces/etc., placed on the rows
of a table
Attribute Values
• Data Consolidation
– Brings data together from several separate systems
– The goal is to reduce the number of data storage locations.
• Data Propagation
– Data propagation is the use of applications to copy data from
one location to another.
– It is event-driven and can be done synchronously or
asynchronously
Data Integration Approaches
• Data Virtualization
– Uses an interface to provide a near real-time, unified view of
data from disparate sources with different data models.
• Data Federation
– A form of data virtualization
– Uses a virtual database and creates a common data model for
heterogeneous data from different systems
• Data Warehousing
– Data warehouses are storage repositories for data
– Data warehousing implies the cleansing, reformatting, and
storage of data
Data Reduction
– Wrapper approaches:
• Use the machine learning algorithm as a black box to find best
subset of attributes
Data Reduction: Histograms
• Text mining
• Image retrieval
• Microarray data analysis
• Protein classification
• Face recognition
• Handwritten digit recognition
• Intrusion detection
Data Reduction-Example
Data Transformation and Discretization