4 Types of Classification Tasks in Machine Learning
4 Types of Classification Tasks in Machine Learning
Learning
Machine learning is a field of study and is concerned with algorithms that learn from
examples.
Classification is a task that requires the use of machine learning algorithms that learn
how to assign a class label to examples from the problem domain. An easy to
understand example is classifying emails as “spam” or “not spam.”
There are many different types of classification tasks that you may encounter in machine
learning and specialized approaches to modeling that may be used for each.
In this tutorial, you will discover different types of classification predictive modeling in
machine learning.
Tutorial Overview
This tutorial is divided into five parts; they are:
A model will use the training dataset and will calculate how to best map examples of
input data to specific class labels. As such, the training dataset must be sufficiently
representative of the problem and have many examples of each class label.
Class labels are often string values, e.g. “spam,” “not spam,” and must be mapped to
numeric values before being provided to an algorithm for modeling. This is often referred
to as label encoding, where a unique integer is assigned to each class label, e.g. “spam”
= 0, “no spam” = 1.
There are many different types of classification algorithms for modeling classification
predictive modeling problems.
There is no good theory on how to map algorithms onto problem types; instead, it is
generally recommended that a practitioner use controlled experiments and discover
which algorithm and algorithm configuration results in the best performance for a given
classification task.
Binary Classification
Multi-Class Classification
Multi-Label Classification
Imbalanced Classification
Let’s take a closer look at each in turn.
Binary Classification
Binary classification refers to those classification tasks that have two class labels.
Examples include:
It is common to model a binary classification task with a model that predicts a Bernoulli
probability distribution for each example.
The Bernoulli distribution is a discrete probability distribution that covers a case where
an event will have a binary outcome as either a 0 or 1. For classification, this means that
the model predicts a probability of an example belonging to class 1, or the abnormal
state.
Logistic Regression
k-Nearest Neighbors
Decision Trees
Support Vector Machine
Naive Bayes
Some algorithms are specifically designed for binary classification and do not natively
support more than two classes; examples include Logistic Regression and Support
Vector Machines.
Next, let’s take a closer look at a dataset to develop an intuition for binary classification
problems.
Running the example first summarizes the created dataset showing the 1,000 examples
divided into input (X) and output (y) elements.
The distribution of the class labels is then summarized, showing that instances belong to
either class 0 or class 1 and that there are 500 examples in each class.
Next, the first 10 examples in the dataset are summarized, showing the input values are
numeric and the target values are integers that represent the class membership.
1 (1000, 2) (1000,)
2
3 Counter({0: 500, 1: 500})
4
5 [-3.05837272 4.48825769] 0
6 [-8.60973869 -3.72714879] 1
7 [1.37129721 5.23107449] 0
8 [-9.33917563 -2.9544469 ] 1
9 [-11.57178593 -3.85275513] 1
10 [-11.42257341 -4.85679127] 1
11 [-10.44518578 -3.76476563] 1
12 [-10.44603561 -3.26065964] 1
13 [-0.61947075 3.48804983] 0
14 [-10.91115591 -4.5772537 ] 1
Finally, a scatter plot is created for the input variables in the dataset and the points are
colored based on their class value.
We can see two distinct clusters that we might expect would be easy to discriminate.
Scatter Plot of Binary Classification Dataset
Multi-Class Classification
Multi-class classification refers to those classification tasks that have more than two
class labels.
Examples include:
Face classification.
Plant species classification.
Optical character recognition.
Unlike binary classification, multi-class classification does not have the notion of normal
and abnormal outcomes. Instead, examples are classified as belonging to one among a
range of known classes.
The number of class labels may be very large on some problems. For example, a model
may predict a photo as belonging to one among thousands or tens of thousands of faces
in a face recognition system.
Problems that involve predicting a sequence of words, such as text translation models,
may also be considered a special type of multi-class classification. Each word in the
sequence of words to be predicted involves a multi-class classification where the size of
the vocabulary defines the number of possible classes that may be predicted and could
be tens or hundreds of thousands of words in size.
k-Nearest Neighbors.
Decision Trees.
Naive Bayes.
Random Forest.
Gradient Boosting.
Algorithms that are designed for binary classification can be adapted for use for multi-
class problems.
This involves using a strategy of fitting multiple binary classification models for each
class vs. all other classes (called one-vs-rest) or one model for each pair of classes
(called one-vs-one).
One-vs-Rest: Fit one binary classification model for each class vs. all other
classes.
One-vs-One: Fit one binary classification model for each pair of classes.
Binary classification algorithms that can use these strategies for multi-class classification
include:
Logistic Regression.
Support Vector Machine.
Next, let’s take a closer look at a dataset to develop an intuition for multi-class
classification problems.
Running the example first summarizes the created dataset showing the 1,000 examples
divided into input (X) and output (y) elements.
The distribution of the class labels is then summarized, showing that instances belong to
class 0, class 1, or class 2 and that there are approximately 333 examples in each class.
Next, the first 10 examples in the dataset are summarized showing the input values are
numeric and the target values are integers that represent the class membership.
1 (1000, 2) (1000,)
2
3 Counter({0: 334, 1: 333, 2: 333})
4
5 [-3.05837272 4.48825769] 0
6 [-8.60973869 -3.72714879] 1
7 [1.37129721 5.23107449] 0
8 [-9.33917563 -2.9544469 ] 1
9 [-8.63895561 -8.05263469] 2
10 [-8.48974309 -9.05667083] 2
11 [-7.51235546 -7.96464519] 2
12 [-7.51320529 -7.46053919] 2
13 [-0.61947075 3.48804983] 0
14 [-10.91115591 -4.5772537 ] 1
Finally, a scatter plot is created for the input variables in the dataset and the points are
colored based on their class value.
We can see three distinct clusters that we might expect would be easy to discriminate.
Multi-Label Classification
Multi-label classification refers to those classification tasks that have two or more class
labels, where one or more class labels may be predicted for each example.
Consider the example of photo classification, where a given photo may have multiple
objects in the scene and a model may predict the presence of multiple known objects in
the photo, such as “bicycle,” “apple,” “person,” etc.
This is unlike binary classification and multi-class classification, where a single class
label is predicted for each example.
It is common to model multi-label classification tasks with a model that predicts multiple
outputs, with each output taking predicted as a Bernoulli probability distribution. This is
essentially a model that makes multiple binary classification predictions for each
example.
Classification algorithms used for binary or multi-class classification cannot be used
directly for multi-label classification. Specialized versions of standard classification
algorithms can be used, so-called multi-label versions of the algorithms, including:
Next, let’s take a closer look at a dataset to develop an intuition for multi-label
classification problems.
Running the example first summarizes the created dataset showing the 1,000 examples
divided into input (X) and output (y) elements.
Next, the first 10 examples in the dataset are summarized showing the input values are
numeric and the target values are integers that represent the class label membership.
1 (1000, 2) (1000, 3)
2
3 [18. 35.] [1 1 1]
4 [22. 33.] [1 1 1]
5 [26. 36.] [1 1 1]
6 [24. 28.] [1 1 0]
7 [23. 27.] [1 1 0]
8 [15. 31.] [0 1 0]
9 [20. 37.] [0 1 0]
10 [18. 31.] [1 1 1]
11 [29. 27.] [1 0 0]
12 [29. 28.] [1 1 0]
Imbalanced Classification
Imbalanced classification refers to classification tasks where the number of examples in
each class is unequally distributed.
Typically, imbalanced classification tasks are binary classification tasks where the
majority of examples in the training dataset belong to the normal class and a minority of
examples belong to the abnormal class.
Examples include:
Fraud detection.
Outlier detection.
Medical diagnostic tests.
These problems are modeled as binary classification tasks, although may require
specialized techniques.
Examples include:
Random Undersampling.
SMOTE Oversampling.
Specialized modeling algorithms may be used that pay more attention to the minority
class when fitting the model on the training dataset, such as cost-sensitive machine
learning algorithms.
Examples include:
Examples include:
Precision.
Recall.
F-Measure.
Next, let’s take a closer look at a dataset to develop an intuition for imbalanced
classification problems.
Running the example first summarizes the created dataset showing the 1,000 examples
divided into input (X) and output (y) elements.
The distribution of the class labels is then summarized, showing the severe class
imbalance with about 980 examples belonging to class 0 and about 20 examples
belonging to class 1.
Next, the first 10 examples in the dataset are summarized showing the input values are
numeric and the target values are integers that represent the class membership. In this
case, we can see that most examples belong to class 0, as we expect.
1 (1000, 2) (1000,)
2
3 Counter({0: 983, 1: 17})
4
5 [0.86924745 1.18613612] 0
6 [1.55110839 1.81032905] 0
7 [1.29361936 1.01094607] 0
8 [1.11988947 1.63251786] 0
9 [1.04235568 1.12152929] 0
10 [1.18114858 0.92397607] 0
11 [1.1365562 1.17652556] 0
12 [0.46291729 0.72924998] 0
13 [0.18315826 1.07141766] 0
14 [0.32411648 0.53515376] 0
Finally, a scatter plot is created for the input variables in the dataset and the points are
colored based on their class value.
We can see one main cluster for examples that belong to class 0 and a few scattered
examples that belong to class 1. The intuition is that datasets with this property of
imbalanced class labels are more challenging to model.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.