1 Unit-1

The document outlines the types of data relevant to machine learning, including qualitative (categorical) and quantitative (numeric) data, along with their subcategories. It emphasizes the importance of data exploration, quality assessment, and pre-processing steps before applying machine learning algorithms. Additionally, it discusses various learning paradigms such as supervised, unsupervised, and reinforcement learning, highlighting their applications in fields like healthcare and finance.

Uploaded by

mukeshbiknalikar05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views42 pages

1 Unit-1

Uploaded by

mukeshbiknalikar05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

UNIT-I

Types of data, exploring structure of data: Exploring and Plotting

numerical data, categorical data and relationship between variables,
data quality and remediation, data pre-processing: Dimensionality
reduction and feature selection.
Types of data
Learning Objectives:

• To understand the incoming data

• Basic understanding about the nature and quality
of the data.
Recap:
• Types of human learning –
supervised, unsupervised, and reinforcement.
• Supervised learning: learning from past data (training
data), known values (classes).
• Supervised learning - guided learning from human
inputs.
Example :
Medical Data, Dataset: Disease diagnosis using patient
records.
Features: Medical test results, symptoms, patient history.
Labels: Diagnoses (e.g., diabetic, non-diabetic).
Dataset: X-ray image classification (e.g., pneumonia
detection).
Features: X-ray images.
Labels: Presence or absence of disease.
• Unsupervised machine learning : doesn’t have labelled
data to learn from.
• finds patterns in unlabeled data- Grouping
• This learning is not guided by labelled inputs but uses
the knowledge gained from the labels themselves.
Example :
Customer Behavior Data
•Dataset: E-commerce user behavior.
• Features: Browsing history, clickstream data, time
spent on pages.
• Task: Group users with similar purchasing patterns.
•Dataset: Social media interactions.
• Features: Likes, shares, and network connections.
• Task: Identify communities or influencers.
• Reinforcement learning in which machine tries to
learn by itself through penalty/ reward mechanism
– again pretty much in the same way as human self-
learning happens.
• applications of machine learning in different domains:
such as banking and finance, insurance, and healthcare.
• Fraud detection is a critical business case which is
implemented in almost all banks across the world and
uses machine learning predominantly.
• Risk prediction for new customers is a similar critical
case in the insurance industry which finds the
application of machine learning.
• In the healthcare sector, disease prediction makes wide
use of machine learning, especially in the developed
countries.
Points to Ponder

No man is perfect. The same is applicable for machines. To increase the level
of accuracy of a machine, human participation should be added to the
machine learning process. In short, incorporating human intervention is the
recipe for the success of machine learning.
MACHINE LEARNING ACTIVITIES

• The first step in machine learning activity starts with

data.
• In case of supervised learning, it is the labelled training
data set followed by test data which is not labelled.
• In case of unsupervised learning, there is no question
of labelled data but the task is to find patterns in the
input data.
• A thorough review and exploration of the data is
needed
 To understand the type of the data,
 The quality of the data and
 Relationship between the different data elements.
• Based on that, multiple pre-processing activities may
need to be done on the input data before we can go
ahead with core machine learning activities.
• Following are the typical preparation activities done
once the input data comes into the machine learning
system:
• Understand the type of data in the given input data set.
• Explore the data to understand the nature and quality.
• Explore the relationships amongst the data elements,
e.g. inter-feature relationship.
• Find potential issues in data.
• Do the necessary remediation, e.g. impute missing data
values, etc., if needed.
• Apply pre-processing steps, as necessary.
• Once the data is prepared for modelling, then the
learning tasks start off.
• As a part of it, do the following activities:
• The input data is first divided into parts – the training
data and the test data (called holdout). This step is
applicable for supervised learning only.
• Consider different models or learning algorithms for
selection. Train the model based on the training data for
supervised learning problem and apply to unknown
data.
• Directly apply the chosen unsupervised model on the
input data for unsupervised learning problem.
• After the model is selected,
 Trained (for supervised learning), and applied on
input data.
 The performance of the model is evaluated.
 Based on options available, specific actions can be
taken to improve the performance of the model, if
possible.
Table 2.1 contains a summary of steps and activities
involved:
2.3 BASIC TYPES OF DATA IN MACHINE LEARNING

• Before starting with types of data, let’s first understand what a

data set is and what are the elements of a data set.
• A data set is a collection of related information or records. The
information may be on some entity or some subject area.
• For example, we may have a data set on students in which each
record consists of information about a specific student.
• Again, we can have a data set on student performance which has
records providing performance, i.e. marks on the individual
subjects.
• Each row of a data set is called a record. Each data set also has
multiple attributes, each of which gives information on a specific
characteristic.
• For example, in the data set on students, there are four attributes namely Roll Number, Name,
Gender, and Age, each of which understandably is a specific characteristic about the student
entity.
• Attributes can also be termed as feature, variable, dimension or field.
• Both the data sets, Student and Student Performance, are having four
features or dimensions; hence they are told to have four-dimensional
data space.
• A row or record represents a point in the four-dimensional data space
as each row has specific values for each of the four attributes or
features.
• Value of an attribute, quite understandably, may vary from record to
Comparison with Nominal Data
Comparison with Discrete Data
Now that a context of data sets is given, let’s try to
understand the different types of data that we generally come
across in machine learning problems. Data can broadly be
divided into following two types:
1. Qualitative data
2. Quantitative data

Qualitative data provides information about the quality of

an object or information which cannot be measured. For
example, if we consider the quality of performance of students
in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the
category of qualitative data. Also, name or roll number of
students are information that cannot be measured using some
scale of measurement. So they would fall under qualitative
data. Qualitative data is also called categorical data.
Qualitative data can be further subdivided into two types as
follows:
1. Nominal data
2. Ordinal data
Nominal data is one which has no numeric value, but a
named value. It is used for assigning named values to
attributes. Nominal values cannot be quantified. Examples of
nominal data are
1. Blood group: A, B, O, AB, etc.
2. Nationality: Indian, American, British, etc.
3. Gender: Male, Female, Other
It is obvious, mathematical operations such as addition,
subtraction, multiplication, etc. cannot be performed on
nominal data. For that reason, statistical functions such as
mean, variance, etc. can also not be applied on nominal data.
However, a basic count is possible. So mode, i.e. most
frequently occurring value, can be identified for nominal data.
Ordinal data, in addition to possessing the properties of
nominal data, can also be naturally ordered. This means
ordinal data also assigns named values to attributes but unlike
nominal data, they can be arranged in a sequence of increasing
or decreasing value so that we can say whether a value is
better than or greater than another value. Examples of ordinal
data are
1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
2. Grades: A, B, C, etc.
3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.

Like nominal data, basic counting is possible for ordinal

data. Hence, the mode can be identified. Since ordering is
possible in case of ordinal data, median, and quartiles can be
identified in addition. Mean can still not be calculated.
Quantitative data relates to information about the quantity
of an object – hence it can be measured. For example, if we
consider the attribute ‘marks’, it can be measured using a scale
of measurement. Quantitative data is also termed as numeric
data. There are two types of quantitative data:
1. Interval data
2. Ratio data
Interval data is numeric data for which not only the order
is known, but the exact difference between values is also
known. An ideal example of interval data is Celsius
temperature. The difference between each value remains the
same in Celsius temperature. For example, the difference
between 12°C and 18°C degrees is measurable and is 6°C as in
the case of difference between 15.5°C and 21.5°C. Other
examples include date, time, etc.
For interval data, mathematical operations such as addition
and subtraction are possible. For that reason, for interval data,
the central tendency can be measured by mean, median, or
mode. Standard deviation can also be calculated.
However, interval data do not have something called a ‘true
zero’ value. For example, there is nothing called ‘0
temperature’ or ‘no temperature’. Hence, only addition and
subtraction applies for interval data. The ratio cannot be
applied. This means, we can say a temperature of 40°C is
equal to the temperature of 20°C + temperature of 20°C.
However, we cannot say the temperature of 40°C means it is
twice as hot as in temperature of 20°C.
Ratio data represents numeric data for which exact value
can be measured. Absolute zero is available for ratio data.
Also, these variables can be added, subtracted, multiplied, or
divided. The central tendency can be measured by mean,
median, or mode and methods of dispersion such as standard
deviation. Examples of ratio data include height, weight, age,
salary, etc.
Figure 2.4 gives a summarized view of different types of
data that we may find in a typical machine learning problem.
Apart from the approach detailed above, attributes can also
be categorized into types based on a number of values that
can
be assigned. The attributes can be either discrete or
continuous
based on this factor.
Discrete attributes can assume a finite or countably infinite
number of values. Nominal attributes such as roll number,
street number, pin code, etc. can have a finite number of
values whereas numeric attributes such as count, rank of
students, etc. can have countably infinite values. A special
type of discrete attribute which can assume two values only is
called binary attribute. Examples of binary attribute include
male/ female, positive/negative, yes/no, etc.
Continuous attributes can assume any possible value which
is a real number. Examples of continuous attribute include
length, height, weight, price, etc.
Note:

In general, nominal and ordinal attributes are discrete. On

the other hand, interval and ratio attributes are continuous,

barring a few exceptions, e.g. ‘count’ attribute.

ICDL Data Analytics - Foundation 1.0
No ratings yet
ICDL Data Analytics - Foundation 1.0
228 pages
Practical-Research-2-Module (Part 2)
No ratings yet
Practical-Research-2-Module (Part 2)
21 pages
UNIT-1 (Preparing To Model)
No ratings yet
UNIT-1 (Preparing To Model)
82 pages
Unit 1
No ratings yet
Unit 1
78 pages
UNIT-2-Preparing To Model
No ratings yet
UNIT-2-Preparing To Model
137 pages
Data Mining Unit-1 Notes
No ratings yet
Data Mining Unit-1 Notes
18 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
38 pages
Unit2PreparingtoModelpptx 2023 09 02 14 52 40
No ratings yet
Unit2PreparingtoModelpptx 2023 09 02 14 52 40
43 pages
ML Notes All
No ratings yet
ML Notes All
257 pages
Chapter 2 - Preparing To Model
No ratings yet
Chapter 2 - Preparing To Model
16 pages
The Machine Learning Process Involves Several Steps That Help Develop and Deploy A Successful Machine Learning Model
No ratings yet
The Machine Learning Process Involves Several Steps That Help Develop and Deploy A Successful Machine Learning Model
62 pages
UNIT02
No ratings yet
UNIT02
41 pages
ML Unit-II Notes
No ratings yet
ML Unit-II Notes
86 pages
Machine Learning Unit 2
No ratings yet
Machine Learning Unit 2
9 pages
ML Lecture 4 Data
No ratings yet
ML Lecture 4 Data
22 pages
The Data Explosion: Modern Computer Systems Are Accumulating Data at An Almost Unimaginable Rate and From A
No ratings yet
The Data Explosion: Modern Computer Systems Are Accumulating Data at An Almost Unimaginable Rate and From A
14 pages
DMML Notes
No ratings yet
DMML Notes
89 pages
Unit 3
No ratings yet
Unit 3
30 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
Unit 2 1
No ratings yet
Unit 2 1
48 pages
Types of Data
No ratings yet
Types of Data
14 pages
ML 2
No ratings yet
ML 2
8 pages
ML 2
No ratings yet
ML 2
4 pages
Unit 2
No ratings yet
Unit 2
12 pages
3-Random Projection and Compressed Sensing Technique-13-01-2025
No ratings yet
3-Random Projection and Compressed Sensing Technique-13-01-2025
84 pages
Unit I 1
No ratings yet
Unit I 1
203 pages
Basic Terminologies in Ai - ML
No ratings yet
Basic Terminologies in Ai - ML
9 pages
Intro MLT 08jan25
No ratings yet
Intro MLT 08jan25
21 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Mining For Exam
No ratings yet
Data Mining For Exam
10 pages
Data and Types of Data
No ratings yet
Data and Types of Data
7 pages
Data in Machine Learning
No ratings yet
Data in Machine Learning
7 pages
Introduction To Data in Machine Learning
No ratings yet
Introduction To Data in Machine Learning
12 pages
Unit 2
No ratings yet
Unit 2
19 pages
ML Notes
No ratings yet
ML Notes
7 pages
Unit 1
No ratings yet
Unit 1
34 pages
ML 3170724 Unit-2
No ratings yet
ML 3170724 Unit-2
40 pages
MZU-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 3
No ratings yet
MZU-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 3
39 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Machine Learning
No ratings yet
Machine Learning
65 pages
What Is Data? Explain The Importance of Data.: Unit I 1
No ratings yet
What Is Data? Explain The Importance of Data.: Unit I 1
52 pages
(IJCST-V3I1P21) : S. Padmapriya
No ratings yet
(IJCST-V3I1P21) : S. Padmapriya
5 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Learning Progress Review Week 10
No ratings yet
Learning Progress Review Week 10
35 pages
AIML Chapter 4
No ratings yet
AIML Chapter 4
100 pages
Introduction To Machine Learning-Q&A
No ratings yet
Introduction To Machine Learning-Q&A
25 pages
Article 7
No ratings yet
Article 7
5 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Lecture 1, Applied Statistics Basic Concepts
No ratings yet
Lecture 1, Applied Statistics Basic Concepts
30 pages
4.0 Introduction To Data
No ratings yet
4.0 Introduction To Data
16 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
39 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
E-Notes 33718 Content Document 20250325122736PM
No ratings yet
E-Notes 33718 Content Document 20250325122736PM
18 pages
Dealing With Different Type of Data
No ratings yet
Dealing With Different Type of Data
32 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
UNIT 2 DT
No ratings yet
UNIT 2 DT
8 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
DM Day2 DataUnderstanding MS S25
No ratings yet
DM Day2 DataUnderstanding MS S25
165 pages
DM - Midsem - Question Bank
No ratings yet
DM - Midsem - Question Bank
5 pages
ML Unit1.notes
No ratings yet
ML Unit1.notes
8 pages
Chapter 2 DS
No ratings yet
Chapter 2 DS
9 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
KS3 G7 ADM Q4 Module1-9-For-Printing
No ratings yet
KS3 G7 ADM Q4 Module1-9-For-Printing
40 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
21 pages
UPCAT Review Math Chapter 5 of 16 PDF
No ratings yet
UPCAT Review Math Chapter 5 of 16 PDF
16 pages
STAT MODULE 2 PopulationSampleMeasures of Central Tendency
No ratings yet
STAT MODULE 2 PopulationSampleMeasures of Central Tendency
8 pages
NLC Lesson 17
No ratings yet
NLC Lesson 17
2 pages
TOPIC Measures of Central Tendency
No ratings yet
TOPIC Measures of Central Tendency
63 pages
4th LESSON1 MEASURES OF CENTRAL TENDENCY MATH 10
No ratings yet
4th LESSON1 MEASURES OF CENTRAL TENDENCY MATH 10
10 pages
Maths CALA Component C
100% (2)
Maths CALA Component C
3 pages
LESSON PLAN For Demo
No ratings yet
LESSON PLAN For Demo
3 pages
B.A.-Economics NEW
No ratings yet
B.A.-Economics NEW
130 pages
5.1 Measures of Central Tendency - Docx Note
No ratings yet
5.1 Measures of Central Tendency - Docx Note
5 pages
Lesson 4 Measure of Central Tendency or Position
No ratings yet
Lesson 4 Measure of Central Tendency or Position
9 pages
CH 3-1
No ratings yet
CH 3-1
49 pages
Long Test
100% (1)
Long Test
2 pages
in Your Line of Work, Cite A Situation Using One of The Quantitative Techniques As Basis of Decision/s You Made
No ratings yet
in Your Line of Work, Cite A Situation Using One of The Quantitative Techniques As Basis of Decision/s You Made
57 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
11 pages
DLL Matatag - Mathematics 8q1 w1
No ratings yet
DLL Matatag - Mathematics 8q1 w1
11 pages
Analysis, Interpretation & Use of Test Data: What Are Measures of Central Tendency?
No ratings yet
Analysis, Interpretation & Use of Test Data: What Are Measures of Central Tendency?
10 pages
SRM Math 7 (Supplementary Reading Materials in Math 7) - Fourth Quarter
No ratings yet
SRM Math 7 (Supplementary Reading Materials in Math 7) - Fourth Quarter
45 pages
An Assignment Quantinative Method
No ratings yet
An Assignment Quantinative Method
12 pages
Psychology Module IGNOU
No ratings yet
Psychology Module IGNOU
11 pages
Data Management
No ratings yet
Data Management
84 pages
4.1 - Interpreting Statistics
No ratings yet
4.1 - Interpreting Statistics
3 pages
AgStat 2.22019 Mannula PDF
No ratings yet
AgStat 2.22019 Mannula PDF
132 pages
Measures of Central Tendency Project +2
No ratings yet
Measures of Central Tendency Project +2
18 pages
Worksheet On Measures of Central Tendency
100% (1)
Worksheet On Measures of Central Tendency
2 pages
1st Sem Stats Book (BHALOTIA) (PDF - Io)
No ratings yet
1st Sem Stats Book (BHALOTIA) (PDF - Io)
153 pages
LS3 DLL (Mean, Median, Mode and Range)
100% (1)
LS3 DLL (Mean, Median, Mode and Range)
6 pages

1 Unit-1

Uploaded by

1 Unit-1

Uploaded by

UNIT-I

Types of data, exploring structure of data: Exploring and Plotting

• To understand the incoming data

• The first step in machine learning activity starts with

• Before starting with types of data, let’s first understand what a

Qualitative data provides information about the quality of

Like nominal data, basic counting is possible for ordinal

In general, nominal and ordinal attributes are discrete. On

barring a few exceptions, e.g. ‘count’ attribute.

You might also like