ML-Lecture-4-data
ML-Lecture-4-data
Lecture 4: Data
COURSE CODE: CSE451
2023
Course Teacher
Dr. Mrinal Kanti Baowaly
Associate Professor
Department of Computer Science and
Engineering, Bangabandhu Sheikh
Mujibur Rahman Science and
Technology University, Bangladesh.
Email: [email protected]
DATA
Data can be any unprocessed fact, value, text, sound, picture or video
that is not being interpreted and analyzed
Data is the most important part of all Data Mining, Machine Learning,
Artificial Intelligence
Without data, we can’t train any model and all modern research and
automation will go vain
Big Enterprises are spending loads of money just to gather as much
certain data as possible
Example: Facebook acquires WhatsApp by paying a huge price of $19
billion
Information and Knowledge
Information: Processed, organized, or structured data to provide context
and meaning.
Knowledge: Combination of inferred information, experiences, learning
and insights. Knowledge is useful and actionable information that can
lead to impact.
Machine Learning is a tool for turning information into knowledge
Types of Data (Variable) in Statistics
Quantitative data vs Qualitative data
Quantitative data
◦ Number-based, countable, or measurable, also known as numerical data
◦ Tell us how many, how much, or how often in calculations
◦ Analyzed using statistical analysis
◦ Examples: measurable such as distance, area, time, speed, height, length, weight,
cost; counts such as the number of website visitors, sales, or email sign-ups etc.
Qualitative data
◦ Interpretation-based, descriptive, and relating to language but not measured or
counted, also known as categorical data
◦ Analyzed by grouping it in terms of meaningful categories
◦ Can help us to understand why, how, or what happened behind certain behaviors
◦ Examples: Employee ID, text, documents, color, marital status, nationality, gender,
grades, education level, etc.
Discrete data vs Continuous data
Discrete Data
◦ Can be counted
◦ Has only a finite or countably infinite set of values
◦ Examples: the number of students in a class, the number of words in a document,
the number of heads in 100 coin flips
◦ Often represented as integer variables.
Continuous Data
◦ Can only be measured
◦ Has any value (real number) within a range
◦ Examples: temperature, height, or weight.
◦ represented as real or floating-point variables.
Nominal data vs Ordinal data
Nominal Data
◦ Qualitative or categorical data
◦ Can’t be quantified, neither have any implicit ordering
◦ No numeric operations can be performed
◦ Examples: Colour of hair (White, Red, Brown, Black, etc.), Marital status (Single,
Widowed, Married), Nationality (Indian, German, American), Gender (Male, Female,
Others), Eye Color (Black, Brown, etc.)
Ordinal Data
◦ Qualitative or categorical data
◦ Have some kind of ranked order, and it is possible to assign numbers to the data
◦ It is possible to compare one item with another in terms of ranking.
◦ Examples: Grades in the exam (A, B, C, D, etc.), Ranking in a competition (First,
Second, Third, etc.), Economic Status (High, Medium, and Low), Education Level
(Higher, Secondary, Primary)
What is Data set?
Collection of data objects and Attributes
their attributes
Tid Refund Marital Taxable
Status Income Cheat
An attribute is a property or 1 Yes Single 125K No
characteristic of an object 2 No Married 100K No
◦ Examples: eye color of a person, temperature, etc. 3 No Single 70K No
◦ Attribute is also known as variable, field, 4 Yes Married 120K No
characteristic, or feature 5 No Divorced 95K Yes
Objects
6 No Married 60K No
A collection of attributes describe an
7 Yes Divorced 220K No
object 8 No Single 85K Yes
◦ Object is also known as record, point, case,
9 No Married 75K No
sample, entity, or instance
10 No Single 90K Yes
10
Types of Data sets
1. Record
◦ Data Matrix 3. Ordered
◦ Document Data ◦ Sequential Transaction Data
◦ Transaction Data ◦ Time Series Data
◦ Sequence Data
2. Graph ◦ Spatial and Spatio-Temporal Data
◦ Generic
◦ World Wide Web
◦ Molecular Structures
1. Record Data
Data that consists of a collection of records, each of which consists
of a fixed set of attributes
Tid Refund Marital Taxable
Status Income Cheat
timeout
season
coach
game
score
team
ball
lost
pla
wi
n
y
Document 1 3 0 5 0 2 6 0 2 0 2
Document 2 0 7 0 2 1 0 0 3 0 0
Document 3 0 1 0 0 1 2 2 0 3 0
Transaction Data
A special type of record data, where
◦ each record (transaction) involves a set of items.
◦ For example, consider a grocery store. The set of products purchased by a
customer during one shopping trip constitute a transaction, while the
individual products that were purchased are the items.
TID Items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
2. Graph Data
Examples: Generic graph, linked webpages/social networks, and a
molecule
2
5 1
2
5