Data Preprocessing and Exploring
Data Preprocessing and Exploring
in DataMining
BIF 515
Neeru S Redhu
Data
Data set: collection of data objects
Records, Entry, Entity, Point, Vector, Pattern, Event,
Case, Sample Or Observation
Median, percentile,
Provides enough information to Hardness of
Ordinal rank correlation, sign
order objects material, grades
test
Mean, standard
Difference between values i.e. unit Calendar dates, deviation, Pearson's
Interval
of measurement exist temperature correlation, t and f
(Quantitative)
tests
Numeric
Geometric mean,
Both difference and ratio are
Ratio Age, mass length harmonic mean,
meaningful
percent variation
Attributes by Number of values
Discrete
Binary
Continuous
vAsymmetric Attributes
vAsymmetric Binary
General Characteristics of Data sets
Dimensionality
Number of attributes that the objects in data set posses
Sparsity
When most attributes of an object have 0 value. This an
advantage as significant saving in terms of computation time
and storage
Resolution
Properties of data are different and different resolution. Eg
earth, weather forecasting
Types of Data Sets
Record Data
Transaction or market basket Data
Data matrix
Sparse data matrix
Graph Based Data
Relation among data objects
Data objects that are graph
Ordered Data
Sequential Data
Sequence Data
Time series Data
Spatial Data
Data Quality
Data mining application often applied to the data that was
collected for unspecified purpose/ application