0% found this document useful (0 votes)
33 views12 pages

03 ML Data Intro

The document discusses different types of data including: 1) Record data such as relational records and data matrices. 2) Non-record data like documents, graphs, ordered data (video, time-series), and spatial/image data. 3) Key characteristics of structured data include dimensionality, sparsity, resolution, and distribution. Data objects represent entities and are described by attributes which can be nominal, binary, ordinal, interval-scaled numeric, ratio-scaled numeric, discrete or continuous.

Uploaded by

In Tech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views12 pages

03 ML Data Intro

The document discusses different types of data including: 1) Record data such as relational records and data matrices. 2) Non-record data like documents, graphs, ordered data (video, time-series), and spatial/image data. 3) Key characteristics of structured data include dimensionality, sparsity, resolution, and distribution. Data objects represent entities and are described by attributes which can be nominal, binary, ordinal, interval-scaled numeric, ratio-scaled numeric, discrete or continuous.

Uploaded by

In Tech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data & its Types

Introduction to Machine Learning

Dr. Hikmat Ullah Khan


Assistant Professor
COMSATS Institute of Information Technology,
Wah Cantt, Pakistan
Email: [email protected]

1
Types of Data Sets
 Record
 Relational records
 Data matrix

timeout

season
coach

game
score
team

ball

lost
pla

wi
n
y
 Document data: text documents
 Transaction data
 Graph and network Document 1 3 0 5 0 2 6 0 2 0 2
 World Wide Web
Document 2 0 7 0 2 1 0 0 3 0 0
 Social or information networks
 Molecular Structures Document 3 0 1 0 0 1 2 2 0 3 0

 Ordered
 Video data: sequence of images
 Temporal data: time-series TID Items
 Sequential Data: transaction sequences 1 Bread, Coke, Milk
 Genetic sequence data 2 Beer, Bread
 Spatial, image and multimedia: 3 Beer, Coke, Diaper, Milk
 Spatial data: maps
4 Beer, Bread, Diaper, Milk
 Image data:
5 Coke, Diaper, Milk
 Video data:

2
Important Characteristics of Structured Data
 Dimensionality
 Attributes/Characteristics/Features
 Sparsity
 Only presence counts
 Resolution
 Patterns depend on the scale/Volume of
data (Big Data)
 Distribution
 Centrality and dispersion
3
Data Objects

 Data sets are made up of data objects.


 A data object represents an entity.
 Examples:
 sales database: customers, store items, sales
 medical database: patients, treatments
 university database: students, professors, courses
 Also called samples , examples, instances, data points,
objects, tuples.
 Data objects are described by attributes.
 Database rows -> data objects; columns ->attributes.
4
Attributes

 Attribute (or dimensions, features, variables):


a data field, representing a characteristic or feature
of a data object.
 E.g., customer _ID, name, address
 Types:
 Nominal

 Binary

 Numeric:

 Interval-scaled

 Ratio-scaled

5
Attribute Types
 Nominal: categories, states, or “names of things”
 Enum
 Why Enumerations?
 Examples
 Hair_color = {black, brown, grey, red, white}
 Universities departments, Engg. programs, occupation, zip codes
 More examples
 ?
 Can we represent values as numbers?
 Why? Why Not?

 Order is significant?

 Statistical formula application possible?

6
Attribute Types
 Binary
 Why do we use Binary variables?
 Nominal attribute with only 2 states (0 and 1)
 Examples:
 ?
 Symmetric binary: both outcomes equally important
 e.g., gender
 Asymmetric binary: outcomes not equally important.
 e.g., medical test (positive vs. negative)

 Convention: assign 1 to most important outcome (e.g., HIV


positive)

7
Attribute Types
 Ordinal
 Values have a meaningful order (ranking) but magnitude between
successive values is not known.
 Examples
 Size = {small, medium, large},
 CGPA or grades,
 designation rankings
 Other examples
 ?

8
Numeric Attribute Types
 NUMERIC / Quantity (integer or real-valued)
 Interval: All normal values
 Measured on a scale of equal-sized units
 Distance b/w values is equal
 100 marks and 90 marks are same distance
values as 50 and 40 are
 Values have order
 E.g., temperature in C˚or F˚, calendar dates
 Zero is significant and Statistical formula apply
 Examples
 About all our normal numeric values

9
Numeric Attribute Types
 NUMERIC / Quantity (integer or real-valued)
 Ratio
 Count based values: Number of ?
 Frequency based or Normalized
 Comparison based values
 Pak Rupees vs Dollars
 Inherent Zero-point (Special Definition of ZERO POINT)
 EXAMPLES:
 Weight, Height, HB-level, etc.
 Zero mean not existence of a value.
 PH value…
 TEMP: ( Not a Ratio value??? )
 If in F, C, IT IS NOT A Ratio.
 TEMP in K is a Ratio Value.
10
Discrete vs. Continuous Attributes
 Discrete Attribute
 finite or countably infinite set of values

 E.g., zip codes, no of deptt, or the set of words in a

collection of documents
 Sometimes, represented as integer variables

 Note: Binary attributes are special case of discrete


attributes
 Continuous Attribute
 Has real numbers as attribute values

 E.g., temperature, height, or weight

 real values can only be measured and represented


using a finite number of digits
 Continuous attributes are typically represented as
floating-point variables
11
12

You might also like