0% found this document useful (0 votes)
26 views

03 ML Data Intro

The document discusses different types of data including: 1) Record data such as relational records and data matrices. 2) Non-record data like documents, graphs, ordered data (video, time-series), and spatial/image data. 3) Key characteristics of structured data include dimensionality, sparsity, resolution, and distribution. Data objects represent entities and are described by attributes which can be nominal, binary, ordinal, interval-scaled numeric, ratio-scaled numeric, discrete or continuous.

Uploaded by

In Tech
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

03 ML Data Intro

The document discusses different types of data including: 1) Record data such as relational records and data matrices. 2) Non-record data like documents, graphs, ordered data (video, time-series), and spatial/image data. 3) Key characteristics of structured data include dimensionality, sparsity, resolution, and distribution. Data objects represent entities and are described by attributes which can be nominal, binary, ordinal, interval-scaled numeric, ratio-scaled numeric, discrete or continuous.

Uploaded by

In Tech
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data & its Types

Introduction to Machine Learning

Dr. Hikmat Ullah Khan


Assistant Professor
COMSATS Institute of Information Technology,
Wah Cantt, Pakistan
Email: [email protected]

1
Types of Data Sets
 Record
 Relational records
 Data matrix

timeout

season
coach

game
score
team

ball

lost
pla

wi
n
y
 Document data: text documents
 Transaction data
 Graph and network Document 1 3 0 5 0 2 6 0 2 0 2
 World Wide Web
Document 2 0 7 0 2 1 0 0 3 0 0
 Social or information networks
 Molecular Structures Document 3 0 1 0 0 1 2 2 0 3 0

 Ordered
 Video data: sequence of images
 Temporal data: time-series TID Items
 Sequential Data: transaction sequences 1 Bread, Coke, Milk
 Genetic sequence data 2 Beer, Bread
 Spatial, image and multimedia: 3 Beer, Coke, Diaper, Milk
 Spatial data: maps
4 Beer, Bread, Diaper, Milk
 Image data:
5 Coke, Diaper, Milk
 Video data:

2
Important Characteristics of Structured Data
 Dimensionality
 Attributes/Characteristics/Features
 Sparsity
 Only presence counts
 Resolution
 Patterns depend on the scale/Volume of
data (Big Data)
 Distribution
 Centrality and dispersion
3
Data Objects

 Data sets are made up of data objects.


 A data object represents an entity.
 Examples:
 sales database: customers, store items, sales
 medical database: patients, treatments
 university database: students, professors, courses
 Also called samples , examples, instances, data points,
objects, tuples.
 Data objects are described by attributes.
 Database rows -> data objects; columns ->attributes.
4
Attributes

 Attribute (or dimensions, features, variables):


a data field, representing a characteristic or feature
of a data object.
 E.g., customer _ID, name, address
 Types:
 Nominal

 Binary

 Numeric:

 Interval-scaled

 Ratio-scaled

5
Attribute Types
 Nominal: categories, states, or “names of things”
 Enum
 Why Enumerations?
 Examples
 Hair_color = {black, brown, grey, red, white}
 Universities departments, Engg. programs, occupation, zip codes
 More examples
 ?
 Can we represent values as numbers?
 Why? Why Not?

 Order is significant?

 Statistical formula application possible?

6
Attribute Types
 Binary
 Why do we use Binary variables?
 Nominal attribute with only 2 states (0 and 1)
 Examples:
 ?
 Symmetric binary: both outcomes equally important
 e.g., gender
 Asymmetric binary: outcomes not equally important.
 e.g., medical test (positive vs. negative)

 Convention: assign 1 to most important outcome (e.g., HIV


positive)

7
Attribute Types
 Ordinal
 Values have a meaningful order (ranking) but magnitude between
successive values is not known.
 Examples
 Size = {small, medium, large},
 CGPA or grades,
 designation rankings
 Other examples
 ?

8
Numeric Attribute Types
 NUMERIC / Quantity (integer or real-valued)
 Interval: All normal values
 Measured on a scale of equal-sized units
 Distance b/w values is equal
 100 marks and 90 marks are same distance
values as 50 and 40 are
 Values have order
 E.g., temperature in C˚or F˚, calendar dates
 Zero is significant and Statistical formula apply
 Examples
 About all our normal numeric values

9
Numeric Attribute Types
 NUMERIC / Quantity (integer or real-valued)
 Ratio
 Count based values: Number of ?
 Frequency based or Normalized
 Comparison based values
 Pak Rupees vs Dollars
 Inherent Zero-point (Special Definition of ZERO POINT)
 EXAMPLES:
 Weight, Height, HB-level, etc.
 Zero mean not existence of a value.
 PH value…
 TEMP: ( Not a Ratio value??? )
 If in F, C, IT IS NOT A Ratio.
 TEMP in K is a Ratio Value.
10
Discrete vs. Continuous Attributes
 Discrete Attribute
 finite or countably infinite set of values

 E.g., zip codes, no of deptt, or the set of words in a

collection of documents
 Sometimes, represented as integer variables

 Note: Binary attributes are special case of discrete


attributes
 Continuous Attribute
 Has real numbers as attribute values

 E.g., temperature, height, or weight

 real values can only be measured and represented


using a finite number of digits
 Continuous attributes are typically represented as
floating-point variables
11
12

You might also like