0% found this document useful (0 votes)
17 views3 pages

Lecture Notes1

This document introduces different types of machine learning algorithms. Classification refers to predicting discrete categorical outputs and can be binary or multiclass. Supervised learning models can be evaluated on a validation set to measure performance. Unsupervised learning finds hidden patterns in unlabeled data and groups similar data into clusters. Semi-supervised learning uses both labeled and unlabeled data by training on a small labeled set and using the model to label more data.

Uploaded by

fgsfgs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Lecture Notes1

This document introduces different types of machine learning algorithms. Classification refers to predicting discrete categorical outputs and can be binary or multiclass. Supervised learning models can be evaluated on a validation set to measure performance. Unsupervised learning finds hidden patterns in unlabeled data and groups similar data into clusters. Semi-supervised learning uses both labeled and unlabeled data by training on a small labeled set and using the model to label more data.

Uploaded by

fgsfgs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Chapter 1 Introduction to Machine Learning

Classification refers to the case when the output variable is a discrete


value or categorical in nature. Classification comes in two types.

• Binary classification

• Multiclassification

When the target class is of two categories, it is referred to as binary,


and when it is more than two classes, it is known as multiclassifications, as
shown in Figure 1-4.

Figure 1-4.  Binary versus multiclass

Another property of supervised learning is that the model’s


performance can be evaluated. Based on the type of model (classification
or regression), the evaluation metric can be applied, and performance
results can be measured. This happens mainly by splitting the training data
into two sets (the train set and the validation set) and training the model
on the train set and testing its performance on the validation set since we
already know the right label/outcome for the validation set.

9
Chapter 1 Introduction to Machine Learning

U
 nsupervised Learning
Unsupervised learning is another category of machine learning that is used
heavily in business applications. It is different from supervised learning in terms
of the output labels. In unsupervised learning, we build the models on similar
sort of data as of supervised learning except for the fact that this dataset does
not contain any label or outcomes column. Essentially, we apply the model
on the data without any right answers. In unsupervised learning, the machine
tries to find hidden patterns and useful signals in the data that can be later used
for other applications. The main objective is to probe the data and come up
with hidden patterns and a similarity structure within the dataset, as shown in
Figure 1-5. One of the use cases is to find patterns within the customer data and
group the customers into different clusters. It can also identify those attributes
that distinguish between any two groups. From a validation perspective, there
is no measure of accuracy for unsupervised learning. The clustering done by
person A can be totally different from that of person B based on the parameters
used to build the model. There are different types of unsupervised learning.

• K-means clustering

• Mapping of nearest neighbor

Figure 1-5.  Clustering


10
Chapter 1 Introduction to Machine Learning

Semi-supervised Learning
As the name suggests, semi-supervised learning lies somewhere in between
supervised and unsupervised learning. In fact, it uses both of the techniques.
This type of learning is mainly relevant in scenarios when we are dealing
with a mixed sort of dataset, which contains both labeled and unlabeled
data. Sometimes it’s just unlabeled data completely, but we label some part
of it manually. The whole idea of semi-supervised learning is to use this
small portion of labeled data to train the model and then use it for labeling
the other remaining part of data, which can then be used for other purposes.
This is also known as pseudo-labeling as it labels the unlabeled data using
the predictions made by the supervised model. To quote a simple example,
say we have lots of images of different brands from social media and most
of it is unlabeled. Now using semi-supervised learning, we can label some
of these images manually and then train our model on the labeled images.
We then use the model predictions to label the remaining images to
transform the unlabeled data to labeled data completely.
The next step in semi-supervised learning is to re-train the model
on entire labeled dataset. The advantage that it offers is that the model
gets trained on a bigger dataset, which was not the case earlier and is
now more robust and better at predictions. The other advantage is that
semi-supervised learning saves a lot of effort and time that could go in to
manually label the data. The flipside of doing all this is that it’s difficult to
get the high performance of the pseudo-labeling as it uses a small part of
the labeled data to make the predictions. However, it is still a better option
rather than manually labeling the data, which can be expensive and time-­
consuming at the same time. This is how semi-supervised learning uses
both the supervised and unsupervised learning to generate the labeled
data. Businesses that face challenges regarding costs associated with the
labeled training process usually go for semi-supervised learning.

11

You might also like