Machine Learning Lecture 4 data types
Machine Learning Lecture 4 data types
PREPROCESSING
1
WHAT IS DATA?
An attribute is a property
or characteristic of an
object
– Examples: eye color of
a person,
temperature, etc.
– Attribute is also known as
A collection
variable, of
field, Objects
characteristic,
attributes describe or an
feature
object
– Object is also known as
record, point, case, sample,
entity, or instance
TYPES OF ATTRIBUTES
3
DISCRETE AND CONTINUOUS ATTRIBUTES
Discrete Attribute
– Has only a finite or countably infinite set of values
– Examples: zip codes, counts, or the set of words in a
collection of documents
– Often represented as integer variables.
– Note: binary attributes are a special case of discrete
attributes
Continuous Attribute
– Has real numbers as attribute values
– Examples: temperature, height, or weight.
– Practically, real values can only be measured and
represented using a finite number of digits.
– Continuous attributes are typically represented as
floating- point variables.
4
TYPES OF DATA SETS
Record
– Data Matrix
– Document Data
– Transaction Data
Graph
– World Wide Web
– Molecular Structures
Ordered
– Temporal Data
– Sequential Data
– Genetic Sequence
Data
5
RECORD DATA
6
DATA MATRIX
7
DOCUMENT DATA
8
TRANSACTION DATA
item
transaction
9
GRAPH DATA
10
CHEMICAL DATA
Benzene Molecule:
C6H6
11
ORDERED DATA
Sequences of
transactions
Items/Events
An element of
the 13
sequence
ORDERED DATA
Genomic sequence
data
13
ORDERED DATA
Spatio-Temporal
Data
Average Monthly
Temperature of
land and ocean
Trajectories of
Moving Objects
14
Spatial Data: Refer to the location-related aspects of
data
16
NOISE
17
Two Sine Waves Two Sine Waves + Noise
OUTLIERS
18
DEVIATION/ANOMALY DETECTION
Network
Intrusion
Detection
19
day
MISSING VALUES
Examples:
– Same person with multiple email addresses
Data cleaning
– Process of dealing with duplicate data issues
21