Unit 2 Supervised Learning and Applications
Unit 2 Supervised Learning and Applications
For instance, an algorithm can learn to predict whether a given email is spam or ham (no
spam), as illustrated below .
Classification algorithms can be better understood using the below diagram. In the below
diagram, there are two classes, class A and Class B. These classes have features that are similar to
each other and dissimilar to other classes.
The algorithm which implements the classification on a dataset is known as a classifier. There are
two types of Classifications:
Binary Classifier: If the classification problem has only two possible outcomes, then it is called as
Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two outcomes, then it is called
as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naive Bayes
o Decision Tree Classification
o Random Forest Classification
Linear Regression
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the below image:
Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.
Evaluation Metrics:
Regression metrics are quantitative measures used to evaluate the nice of a regression
model. Scikit-analyze provides several metrics, each with its own strengths and boundaries,
to assess how well a model suits the statistics.
Nonlinear regression
y=f(x,β)+ϵ
where:
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we
want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm,
as it works on a similarity measure. Our KNN model will find the similar features of the new
data set to the cats and dogs images and based on the most similar features it will put it in
either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need a
K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a
leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the
child nodes.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using
a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or dog,
so such a model can be created by using the SVM algorithm. We will first train
our model with lots of images of cats and dogs so that it can learn about different
features of cats and dogs, and then we test it with this strange creature. So as
support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat
and dog. On the basis of the support vectors, it will classify it as a cat. Consider
the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
1. Hyperplane:
o The goal of SVM is to find the optimal hyperplane that best separates the data points into
different classes. In a two-dimensional space, this hyperplane is simply a line, but in higher
dimensions, it becomes a plane or a hyperplane.
o The best hyperplane is the one that maximizes the margin—the distance between the
hyperplane and the nearest data points from each class, known as support vectors.
2. Support Vectors:
o These are the data points that lie closest to the hyperplane. They are critical because they
define the position and orientation of the hyperplane. Removing these points would change
the decision boundary.
3. Margin:
o The margin is the distance between the hyperplane and the nearest support vectors from
both classes. A larger margin is generally better, as it means the model is more confident in
its classification.
4. Linear Separability:
o In cases where the data is linearly separable (i.e., it can be perfectly separated by a straight
line or hyperplane), SVM will find the hyperplane that maximizes the margin between the
two classes.
5. Non-Linearly Separable Data:
o When the data is not linearly separable, SVM uses a technique called the kernel trick to
transform the data into a higher-dimensional space where it becomes linearly separable.
1. Pricing Optimization
Problem: Businesses need to find the optimal price point for products or services to maximize
revenue and profits while staying competitive.
Regression models (e.g., Linear Regression, Ridge Regression, and Decision Trees) can predict the
optimal price for a product by analyzing historical sales data, competitor pricing, demand
fluctuations, and customer behavior.
Example: An online retailer uses a supervised learning model to determine dynamic pricing
strategies based on competitor pricing, time of year, and customer demand, adjusting prices in real-
time to optimize profits.
Churn Prediction: Classification algorithms like Logistic Regression, Support Vector Machines (SVM),
or Random Forests can predict which customers are likely to churn based on past interactions,
purchase frequency, and customer service calls.
Customer Lifetime Value (CLV) Prediction: Regression models can predict the long-term value of a
customer based on transaction history, buying frequency, and demographic data.
Example: A telecom company uses churn prediction models to identify at-risk customers and offers
targeted retention programs (e.g., discounts or personalized services) to prevent churn.
3. Sales Forecasting
Problem: Businesses need accurate sales forecasts to manage inventory, plan production, and
allocate resources efficiently.
Supervised models can predict future sales based on historical sales data, market trends, and
seasonal factors.
Lead Scoring: Classification models like Logistic Regression or Decision Trees can predict the
likelihood of a sales lead converting into a customer based on factors like interaction history, product
interest, and lead demographics.
Example: A retail chain uses supervised learning models to predict future sales across different
locations based on factors like previous sales, seasonal trends, and local economic conditions.
Problem: Marketers need to design personalized campaigns and allocate resources to the most
effective channels for customer acquisition and retention.
Targeted Marketing: it can segment customers based on their behavior, preferences, and
demographics. Marketers can then create personalized campaigns targeted to specific customer
segments.
Campaign Response Prediction: Regression models can predict how likely a customer is to respond
to a specific marketing campaign (e.g., email, social media ad) based on past campaign data.
Example: An e-commerce company uses a supervised learning model to segment customers into
categories like frequent buyers, discount seekers, and one-time shoppers, creating targeted email
campaigns for each group.
Model evaluation is the process that uses some metrics which help us to analyze
the performance of the model. As we all know that model development is a multi-step process and a
check should be kept on how well the model generalizes future predictions. Therefore evaluating a
model plays a vital role so that we can judge the performance of our model. The evaluation also
helps to analyze a model’s key weaknesses. There are many metrics like Accuracy, Precision, Recall,
F1 score, Area under Curve, Confusion Matrix, and Mean Square Error. Cross Validation is one
technique that is followed during the training phase and it is a model evaluation technique as well.
Evaluation Metrics for Classification Task
In this Python code, we have imported the iris dataset which has features like the length and width
of sepals and petals. The target values are Iris setosa, Iris virginica, and Iris versicolor. After
importing the dataset we divided the dataset into train and test datasets in the ratio 80:20. Then we
called Decision Trees and trained our model. After that, we performed the prediction and calculated
the accuracy score, precision, recall, and f1 score. We also plotted the confusion matrix.
Importing Libraries and Dataset
Python libraries make it very easy for us to handle the data and perform typical and complex tasks
with a single line of code.
Pandas – This library helps to load the data frame in a 2D array format and has multiple
functions to perform analysis tasks in one go.
Numpy – Numpy arrays are very fast and can perform large computations in a very short
time.
Matplotlib/Seaborn – This library is used to draw visualizations.