Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
1 Supervised Learning
This section will help you to understand what you can expect from this course. It also lists the real worl
applications in order to feel the depth of the subject. As we know that supervised learning can be done based
on classification method or by Regression method. We shall in detail discuss them here:
1.1 Classification
The problem under supervised learning is categorical then it comes under classification problem. Here, the
goal is to map the input variable to its output y, where, y ∈ {1,2,...,C} C is the output class. If C = 2
then it is called as Binary classification and if C > 2 then it is multiclass classification. Labels are used to
determine the classes.
But sometimes the labels may get confused and will not be able to determine the exact class. i.e., there is
an overlap among the classes which results in ambiguity. This type of classification is called as multilabel
classification.
Everything looks simple here. Is it the reality of supervised learning? The answer is a big no.
Question 1:
What is the challenge/problem?
The problem of supervised learning is to formalize or find out the function that makes predictions for the
new data input.
This formalizing is known as function approximation. We know that generally the supervised learning can
be written as y = f (x). But now f is an unknown function. We need to formalize or estimate it in order to
make the predictions on inputs that were not seen before. Hence, we can rewrite the model as fˆ(x).
The diagram below explains the challenge:
There are two sets available yes, no or 0,1. There are two tables presented in the diagram. One table
represents the relation between the classes and features with N × D dimension. Classes and features can be
1
of any number which are critical parameters to be considered for prediction. Another table shows the labels
for the corresponding classes in the previous table. It consists of 1 and 0.
There are some test cases shown in the diagram:
• Blue crescent
• Yellow torus
• Blue arrow
But then these test cases do not fall under the labels shown. Neither it can be sorted based on the shape
nor it can be done using the color. Hence it creates an ambiguity. To remove this ambiguity Probabilistic
Prediction can be used.
Question 1:
What is argmax?
2
2.1 Supervised Learning - Classification Approach:
• Document classification - which category does the document belong to?
2.2 Regression:
• Predicting tomorrow’s stock market
• Discovering latent factors - Dimensionality reduction (3d to 2d) and Principle Component Analysis
(PCA).
• Matrix Completion - Filling the missing data with NaN - ”Not a Number” - eg:- Image inpainting.
Collaborative Filtering and Market Basket Analysis.