Lecture Notes1
Lecture Notes1
• Binary classification
• Multiclassification
9
Chapter 1 Introduction to Machine Learning
U
nsupervised Learning
Unsupervised learning is another category of machine learning that is used
heavily in business applications. It is different from supervised learning in terms
of the output labels. In unsupervised learning, we build the models on similar
sort of data as of supervised learning except for the fact that this dataset does
not contain any label or outcomes column. Essentially, we apply the model
on the data without any right answers. In unsupervised learning, the machine
tries to find hidden patterns and useful signals in the data that can be later used
for other applications. The main objective is to probe the data and come up
with hidden patterns and a similarity structure within the dataset, as shown in
Figure 1-5. One of the use cases is to find patterns within the customer data and
group the customers into different clusters. It can also identify those attributes
that distinguish between any two groups. From a validation perspective, there
is no measure of accuracy for unsupervised learning. The clustering done by
person A can be totally different from that of person B based on the parameters
used to build the model. There are different types of unsupervised learning.
• K-means clustering
Semi-supervised Learning
As the name suggests, semi-supervised learning lies somewhere in between
supervised and unsupervised learning. In fact, it uses both of the techniques.
This type of learning is mainly relevant in scenarios when we are dealing
with a mixed sort of dataset, which contains both labeled and unlabeled
data. Sometimes it’s just unlabeled data completely, but we label some part
of it manually. The whole idea of semi-supervised learning is to use this
small portion of labeled data to train the model and then use it for labeling
the other remaining part of data, which can then be used for other purposes.
This is also known as pseudo-labeling as it labels the unlabeled data using
the predictions made by the supervised model. To quote a simple example,
say we have lots of images of different brands from social media and most
of it is unlabeled. Now using semi-supervised learning, we can label some
of these images manually and then train our model on the labeled images.
We then use the model predictions to label the remaining images to
transform the unlabeled data to labeled data completely.
The next step in semi-supervised learning is to re-train the model
on entire labeled dataset. The advantage that it offers is that the model
gets trained on a bigger dataset, which was not the case earlier and is
now more robust and better at predictions. The other advantage is that
semi-supervised learning saves a lot of effort and time that could go in to
manually label the data. The flipside of doing all this is that it’s difficult to
get the high performance of the pseudo-labeling as it uses a small part of
the labeled data to make the predictions. However, it is still a better option
rather than manually labeling the data, which can be expensive and time-
consuming at the same time. This is how semi-supervised learning uses
both the supervised and unsupervised learning to generate the labeled
data. Businesses that face challenges regarding costs associated with the
labeled training process usually go for semi-supervised learning.
11